Low-Level System InformationMachine InterfaceThe machine interface describes the specific use of the Power ISA
64-bit features to implement the ELF ABI version 2.Processor ArchitectureThis ABI is predicated on, at a minimum, Power ISA version 2.7 and
contains additional implementation characteristics.All OpenPOWER instructions that are defined by the Power
Architecture can be assumed to be implemented and to work as specified.
ABI-conforming implementations must provide these instructions through
software emulation if they are not provided by the OpenPOWER-compliant
processor.In addition, the instruction specification must meet additional
implementation-defined specifics as commonly required by the OpenPOWER
specification.OpenPOWER-compliant processors may support additional instructions
beyond the published Power Instruction Set Architecture (ISA) and may
include optional Power Architcture instructions.This ABI does not explicitly impose any performance constraints on
systems.Data RepresentationByte OrderingThe following standard data formats are recognized:8-bit byte16-bit halfword32-bit word64-bit doubleword128-bit quadwordIn little-endian byte ordering, the least-significant byte is
located in the lowest addressed byte position in memory (byte 0). This
byte ordering is alternately referred to as least-significant byte
(LSB) ordering.In big-endian byte ordering, the most-significant byte is located
in the lowest addressed byte position in memory (byte 0). This byte
ordering is alternately referred to as most-significant byte (MSB)
ordering.A specific OpenPOWER-compliant processor implementation must
state which type of byte ordering is to be used.MSR[LE|SLE]: Although it may be possible to modify the
active byte ordering of an application process that uses
application-accessible configuration controls or that uses system
calls on some systems, applications that change active byte ordering
during the course of execution do not conform to this ABI. through
show the conventions assumed
in little-endian byte ordering at the bit and byte levels. These
conventions are applied to integer and floating-point data types. As
shown in
, byte numbers are indicated
in the upper corners, and bit numbers are indicated in the lower
corners.
Little-Endian Bit and Byte Numbering ExampleLittle-Endian Byte NumberLittle-Endian Bit Number EndLittle-Endian Bit Number Start
Little-Endian Bit and Byte Numbering in Halfwords10MSBLSB15870
Little-Endian Bit and Byte Numbering in Words3210MSBLSB3124231615870
Little-Endian Bit and Byte Numbering in Doublewords7654MSB63565548474039323210LSB3124231615870
Little-Endian Bit and Byte Numbering in Quadwords15141312MSB127120119112111104103961110989588878079727164765463565548474039323210LSB3124231615870
through
show the conventions assumed
in big-endian byte ordering at the bit and byte levels. These
conventions are applied to integer and floating-point data types. As
shown in
, byte numbers are indicated
in the upper corners, and bit numbers are indicated in the lower
corners.
Big-Endian Bit and Byte Numbering ExampleBig-Endian Byte NumberBig-Endian Bit Number StartBig-Endian Bit Number End
Big-Endian Bit and Byte Numbering in Halfwords01MSBLSB07815
Big-Endian Bit and Byte Numbering in Words0123MSBLSB0781516232431
Big-Endian Bit and Byte Numbering in Doublewords0123MSB07815162324314567LSB3239404748555663
Big-Endian Bit and Byte Numbering in Quadwords0123MSB078151623243145673239404748555663891011647172798087889512131415LSB96103104111112119120127
In the Power ISA, the figures are generally only shown in
big-endian byte order. The bits in this data format specification are
numbered from left to right (MSB to LSB).FPSCR Formats: As of Power ISA version 2.05, the
FPSCR is extended from 32 bits to 64 bits. The fields of the original
32-bit FPSCR are now held in bits 32–63 of the 64-bit FPSCR. The
assembly instructions that operate upon the 64-bit FPSCR have either
a W instruction field added to select the operative word for the
instruction (for example,
mtfsfi) or the instruction is extended to
operate upon the entire 64-bit FPSCR, (for example,
mffs). Fields of the FPSCR that represent 1 or
more bits are referred to by field number with an indication of the
operative word rather than by bit number.Fundamental Types describes the ISO C scalar
types, and
describes the vector types of
the POWER SIMD vector programming API. Each type has a required
alignment, which is indicated in the Alignment column. Use of these
types in data structures must follow the alignment specified, in the
order encountered, to ensure consistent mapping. When using variables
individually, more strict alignment may be imposed if it has
optimization benefits.Regardless of the alignment rules for the allocation of data
types, pointers to both aligned and unaligned data of each data type
shall return the value corresponding to a data type starting at the
specified address when accessed with either the pointer dereference
operator * or the array reference operator [ ].
Scalar TypesTypeISO C TypessizeofAlignmentDescriptionBoolean_Bool1ByteBooleanCharacterchar1ByteUnsigned byteunsigned charsigned char1ByteSigned byteEnumerationsigned enum4WordSigned wordunsigned enum4WordUnsigned wordIntegralint4WordSigned wordsigned intunsigned int4WordUnsigned wordlong int8DoublewordSigned doublewordsigned long int8DoublewordSigned doublewordunsigned long int8DoublewordUnsigned doublewordlong long int8DoublewordSigned doublewordsigned long long intunsigned long long int8DoublewordUnsigned doublewordshort int2HalfwordSigned halfwordsigned short intunsigned short int2HalfwordUnsigned halfword__int12816QuadwordSigned quadwordsigned __int128unsigned __int12816QuadwordUnsigned quadwordPointerany *8DoublewordData pointerany (*) ( )Function pointerBinary Floating-Pointfloat4WordSingle-precision floatdouble8DoublewordDouble-precision floatlong double16QuadwordExtended- or quad-precision float
A NULL pointer has all bits set to zero.A Boolean value is represented as a byte with a value of 0
or 1. If a byte with a value other than 0 or 1 is evaluated as a
boolean value (for example, through the use of unions), the
behavior is undefined.If an enumerated type contains a negative value, it is
compatible with and has the same representation and alignment as
int. Otherwise, it is compatible with and has the same
representation and alignment as an unsigned int.For each real floating-point type, there is a corresponding
imaginary type with the same size and alignment, and there is a
corresponding complex type. The complex type has the same
alignment as the real type and is twice the size; the
representation is the real part followed by the imaginary
part.
Vector TypesTypePower SIMD C TypessizeofAlignmentDescriptionvector-128vector unsigned char16QuadwordVector of 16 unsigned bytes.vector signed char16QuadwordVector of 16 signed bytes.vector bool char16QuadwordVector of 16 bytes with a value of either 0 or
28 – 1.vector unsigned short16QuadwordVector of 8 unsigned halfwords.vector signed short16QuadwordVector of 8 signed halfwords.vector bool short16QuadwordVector of 8 halfwords with a value of either 0 or
216 – 1.vector unsigned int16QuadwordVector of 4 unsigned words.vector signed int16QuadwordVector of 4 signed words.vector bool int16QuadwordVector of 4 words with a value of either 0 or
232 – 1.vector unsigned longThe vector long types are deprecated due to their
ambiguity between 32-bit and 64-bit environments. The use
of the vector long long types is preferred.vector unsigned long long16QuadwordVector of 2 unsigned doublewords.vector signed longvector signed long long16QuadwordVector of 2 signed doublewords.vector bool longvector bool long long16QuadwordVector of 2 doublewords with a value of either 0 or
264 – 1.vector unsigned __int12816QuadwordVector of 1 unsigned quadword.vector signed __int12816QuadwordVector of 1 signed quadword.vector _Float1616QuadwordVector of 8 half-precision floats.vector float16QuadwordVector of 4 single-precision floats.vector double16QuadwordVector of 2 double-precision doubles.
Elements of Boolean vector data types must have a value
corresponding to all bits set to either 0 or 1. The result of
computations on Boolean vectors, where at least one element is not
well formedAn element is well formed if it has all bits set to 0 or all
bits set to 1., is undefined for all vector elements.Decimal Floating-Point
(ISO TR 24732 Support)The decimal floating-point data type is used to specify variables
corresponding to the IEEE 754-2008 densely packed, decimal
floating-point format.
IBM EXTENDED PRECISION TypeTypeISO C TypessizeofAlignmentDescriptionIBM EXTENDED PRECISIONlong double16QuadwordTwo double-precision floats.
IEEE BINARY 128 EXTENDED
PRECISION
IEEE BINARY 128 EXTENDED PRECISION TypeTypeISO C TypessizeofAlignmentDescriptionNotesIEEE BINARY 128 EXTENDED PRECISIONlong double16QuadwordIEEE 128-bit quad-precision float.IEEE BINARY 128 EXTENDED PRECISION_Float12816QuadwordIEEE 128-bit quad-precision float.,
Phased in. This type is being phased in and it may
not be available on all implementations.__float128 shall be recognized as a synonym for the
_Float128 data type, and it is used interchangeably to
refer to the same type. Implementations that do not offer
support for _Float128 may provide this type with the
__float128 type only.
IBM EXTENDED PRECISION && IEEE BINARY 128 EXTENDED
PRECISIONAvailability of the long double data type is subject to
conformance to a long double standard where the IBM EXTENDED PRECISION
format and the IEEE BINARY 128 EXTENDED PRECISION format are mutually
exclusive.IEEE BINARY 128 EXTENDED
PRECISION || IBM EXTENDED PRECISIONThis ABI provides the following choices for implementation of
long double in compilers and systems. The preferred implementation for
long double is the IEEE 128-bit quad-precision binary floating-point
type.IEEE BINARY 128 EXTENDED PRECISIONLong double is implemented as an IEEE 128-bit quad-precision
binary floating-point type in accordance with the applicable IEEE
floating-point standards.Support is provided for all IEEE standard features.IEEE128 quad-precision values are passed in VMX parameter
registers.With some compilers, _Float128 can be used to access IEEE128
independent of the floating-point representation chosen for the
long double ISO C type. However, this is not part of the C
standard.IBM EXTENDED PRECISIONSupport is provided for the IBM EXTENDED PRECISION format. In
this format, double-precision numbers with different magnitudes
that do not overlap provide an effective precision of 106 bits or
more, depending on the value. The high-order double-precision value
(the one that comes first in storage) must have the larger
magnitude. The high-order double-precision value must equal the sum
of the two values, rounded to nearest double (the Linux convention,
unlike AIX).IBM EXTENDED PRECISION form provides the same range as double
precision (about 10–308 to
10308) but more precision (a variable amount,
about 31 decimal digits or more).As the absolute value of the magnitude decreases (near the
denormal range), the precision available in the low-order double
also decreases.When the value represented is in the subnormal or denormal
range, this representation provides no more precision than 64-bit
(double) floating-point.The actual number of bits of precision can vary. If the
low-order part is much less than one unit of least precision (ULP)
of the high-order part, significant bits (all 0s or all 1s) are
implied between the significands of high-order and low-order
numbers. Some algorithms that rely on having a fixed number of bits
in the significand can fail when using extended precision.This implementation differs from the IEEE 754 Standard in the
following ways:The software support is restricted to round-to-nearest mode.
Programs that use extended precision must ensure that this rounding
mode is in effect when extended-precision calculations are
performed.This implementation does not fully support the IEEE special
numbers NaN and INF. These values are encoded in the high-order
double value only. The low-order value is not significant, but the
low-order value of an infinity must be positive or negative
zero.This implementation does not support the IEEE status flags
for overflow, underflow, and other conditions. These flags have no
meaning in this format.Aggregates and UnionsThe following rules for aggregates (structures and arrays) and
unions apply to their alignment and size:The entire aggregate or union must be aligned to its most
strictly aligned member, which corresponds to the member with the
largest alignment, including flexible array members.Each member is assigned the lowest available offset that
meets the alignment requirements of the member. Depending on the
previous member, internal padding can be required.The entire aggregate or union must have a size that is a
multiple of its alignment. Depending on the last member, tail
padding may be required.For
through
, the big-endian byte offsets
are located in the upper left corners, and the little-endian byte
offsets are located in the upper right corners.Bit FieldsBit fields can be present in definitions of C structures and
unions. These bit fields define whole objects within the structure or
union where the number of bits in the bit field is specified.In
, a signed range goes from
–2w – 1 to
2w – 1 – 1 and an unsigned range goes from 0 to
2w – 1.
Bit Field TypesBit Field TypeWidth (w)_Bool1signed char1–8unsigned charsigned short1–16unsigned shortsigned int1–32unsigned intenumsigned long1–64unsigned longsigned long longunsigned long longsigned __int1281–128unsigned __int128
Bit fields can be a signed or unsigned of type short, int, long,
or long long. However, bit fields shall have the same range for each
corresponding type. For example, signed short must have the same range
as unsigned short. All members of structures and unions, including bit
fields, must comply with the size and alignment rules. The following
list of additional size and alignment rules apply to bit fields:The allocation of bit fields is determined by the system
endianness. For little-endian implementations, the bit allocation
is from the least-significant (right) end to the most-significant
(left) end. The reverse is true for big-endian implementations; the
bit allocation is from most-significant (left) end to the
least-significant (right) end.A bit field cannot cross its unit boundary; it must occupy
part or all or the storage unit allocated for its declared
type.If there is enough space within a storage unit, bit fields
must share the storage unit with other structure members, including
members that are not bit fields. Clearly, all the structure members
occupy different parts of the storage unit.The types of unnamed bit fields have no effect on the
alignment of a structure or union. However, the offsets of an
individual bit field's member must comply with the alignment rules.
An unnamed bit field of zero width causes sufficient padding
(possibly none) to be inserted for the next member, or the end of
the structure if there are no more nonzero width members, to have
an offset from the start of the structure that is a multiple of the
size of the declared type of the zero-width member.In
, the little-endian byte
offsets are given in the upper right corners, and the bit numbers are
given in the lower corners.
Little-Endian Bit Numbering for 0x01020304765401026356554847403932321003043124231615870
In
, the big-endian byte offsets
are given in the upper left corners, and the bit numbers are given in
the lower corners.
Big-Endian Bit Numbering for 0x01020304012301020781516232431456703043239404748555663
The byte offsets for structure and union members are shown in
through
. , the alignment of the
structure is not affected by the unnamed short and int fields. The
named members are aligned relative to the start of the structure.
However, it is possible that the alignment of the named members is
not on optimum boundaries in memory. For instance, in an array of
the structure in
, the d members will not
all be on 4-byte (integer) boundaries.Function Calling SequenceThe standard sequence for function calls is outlined in this section.
The layout of the stack frame, the parameter passing convention, and the
register usage are also described in this section. Standard library
functions use these conventions, except as documented for the register save
and restore functions.The conventions given in this section are adhered to by C programs.
For more information about the implementation of C, See https://apps.na.collabserv.com/meetings/join?id=2897-3986
.While it is recommended that all functions use the standard
calling sequence, the requirements of the standard calling sequence are
only applicable to global functions. Different calling sequences and
conventions can be used by local functions that cannot be reached from
other compilation units, if they comply with the stack back trace
requirements. Some tools may not work with alternate calling sequences
and conventions.RegistersPrograms and compilers may freely use all registers except those
reserved for system use. The system signal handlers are responsible for
preserving the original values upon return to the original execution
path. Signals
that can interrupt the original execution path are documented in the
System V Interface Definition (SVID).The tables in
give an overview of the
registers that are global during program execution. The tables use three
terms to describe register preservation rules:NonvolatileA caller can expect that the contents of all registers
marked nonvolatile are valid after control returns from a
function call.A callee shall save the contents of all registers marked
nonvolatile before modification. The callee must restore the
contents of all such registers before returning to its
caller.VolatileA caller cannot trust that the contents of registers
marked volatile have been preserved across a function
call.A callee need not save the contents of registers marked
volatile before modification.Limited-accessThe contents of registers marked limited-access have
special preservation rules. These registers have mutability
restricted to certain bit fields as defined by the Power ISA.
The individual bits of these bit fields are defined by this ABI
to be limited-access.Under normal conditions, a caller can expect that these
bits have been preserved across a function call. Under the
special conditions indicated in
,
a caller shall expect that these bits will have changed across
function calls even if they have not.A callee may only permanently modify these bits without
preserving the state upon entrance to the function if the
callee satisfies the special conditions indicated in
.
Otherwise, these bits must be preserved before modification and
restored before returning to the caller.ReservedThe contents of registers marked reserved are for
exclusive use of system functions, including the ABI. In
limited circumstances, a program or program libraries may set
or query such registers, but only when explicitly allowed in
this document.Register RolesIn the 64-bit OpenPOWER Architecture, there are always 32
general-purpose registers, each 64 bits wide. Throughout this document
the symbol rN is used, where N is a register number, to refer to
general-purpose register N.
Register RolesRegisterPreservation RulesPurposer0VolatileOptional use in function linkage.Used in function prologues.r1NonvolatileStack frame pointer.r2NonvolatileRegister r2 is nonvolatile with respect to calls
between functions in the same compilation unit. It is saved
and restored by code inserted by the linker resolving a
call to an external function. For more information, see
.TOC pointer.r3–r10VolatileParameter and return values.r11VolatileOptional use in function linkage.Used as an environment pointer in languages that
require environment pointers.r12VolatileOptional use in function linkage.Function entry address at the global entry
point.r13ReservedThread pointer (see
).r14–r31If a function needs a frame pointer, assigning r31 to
the role of the frame pointer is recommended.NonvolatileLocal variables.LRVolatileLink register.CTRVolatileLoop count register.TARReservedReserved for system use. This register should not be
read or written by application software.XERVolatileFixed-point exception register.CR0–CR1VolatileCondition register fields.CR2–CR4NonvolatileCondition register fields.CR5–CR7VolatileCondition register fields.DSCRLimited AccessData stream prefetch control.VRSAVEReservedReserved for system use. This register should not be
read or written by application software.
TOC Pointer
UsageAs described in
, the TOC pointer, r2, is
commonly initialized by the global function entry point when a function
is called through the global entry point. It may be called from a
module other than the current function's module or from an unknown call
point, such as through a function pointer. (For more information, see
.)In those instances, it is the caller's responsibility to store
the TOC pointer, r2, in the TOC pointer doubleword of the caller's
stack frame. For references external to the compilation unit, this code
is inserted by the static linker if a function is to be resolved by the
dynamic linker. For references through function pointers, it is the
compiler's or assembler programmer's responsibility to insert
appropriate TOC save and restore code. If the function is called from
the same module as the callee, the callee must preserve the value of
r2. (See
for a description of function
entry conventions.)When a function calls another function, the TOC pointer must have
a legal value pointing to the TOC base, which may be initialized as
described in
.When global data is accessed, the TOC pointer must be available
for dereference at the point of all uses of values derived from the TOC
pointer in conjunction with the @l operator. This property is used by
the linker to optimize TOC pointer accesses. In addition, all reaching
definitions for a TOC-pointer-derived access must compute the same
definition for code to be ABI compliant. (See the
.)In some implementations, non ABI-compliant code may be processed
by providing additional linker options; for example, linker options
disabling linker optimization. However, this behavior in support of
non-ABI compliant code is not guaranteed to be portable and supported
in all systems. For examples of compliant and noncompliant code, see
.Optional Function
LinkageExcept as follows, a function cannot depend on the values of
those registers that are optional in the function linkage (r0, r11, and
r12) because they may be altered by interlibrary calls:When a function is entered in a way to initialize its
environment pointer, register r11 contains the environment pointer.
It is used to support languages with access to additional
environment context; for example, for languages that support
lexical nesting to access its lexically nested outer
context.When a function is entered through its global entry point,
register r12 contains the entry-point address. For more
information, see the description of dual entry points in
and
.Stack Frame PointerThe stack pointer always points to the lowest allocated valid
stack frame. It must maintain quadword alignment and grow toward the
lower addresses. The contents of the word at that address point to the
previously allocated stack frame when the code has been compiled to
maintain back chains. A called function is permitted to decrement it if
required. For more information, see
.Link RegisterThe link register contains the address that a called function
normally returns to. It is volatile across function calls.Condition Register FieldsIn the condition register, the bit fields CR2, CR3, and CR4 are
nonvolatile. The value on entry must be restored on exit. The other bit
fields are volatile.This ABI requires OpenPOWER-compliant processors to implement
mfocr instructions in a manner that initializes
undefined bits of the RT result register of
mfocr instructions to one of the following
values:0, in accordance with OpenPOWER-compliant processor
implementation practiceThe architected value of the corresponding CR field in the
mfocr instructionErratum:
When executing an
mfocr instruction, the POWER8 processor does not
implement the behavior described in the "Fixed-Point Invalid Forms
and Undefined Conditions" section of
POWER8 Processor User's Manual for the Single-Chip
Module. Instead, it replicates the selected condition
register field within the byte that contains it rather than
initializing to 0 the bits corresponding to the nonselected bits of
the byte that contains it. When generating code to save two condition
register fields that are stored in the same byte, the compiler must
mask the value received from
mfocr to avoid corruption of the resulting
(partial) condition register word.This erratum does not apply to the POWER9 processor.For more information, see
Power ISA, version 3.0 and "Fixed-Point Invalid
Forms and Undefined Conditions" in
POWER9 Processor User's Manual.In
OpenPOWER-compliant processors, floating-point and vector functions are
implemented using a unified vector-scalar model. As shown in
and
, there are 64 vector-scalar
registers; each is 128 bits wide.The vector-scalar registers can be addressed with vector-scalar
instructions, for vector and scalar processing of all 64 registers, or
with the "classic" Power floating-point instructions to refer to a
32-register subset of 64 bits per register. They can also be addressed
with VMX instructions to refer to a 32-register subset of 128-bit wide
registers.The classic floating-point repertoire consists of 32
floating-point registers, each 64 bits wide, and an associated
special-purpose register to provide floating-point status and control.
Throughout this document, the symbol fN is used, where N is a register
number, to refer to floating-point register N.For the purpose of function calls, the right half of VSX
registers, corresponding to the classic floating-point registers (that
is, vsr0–vsr31), is volatile.
Floating-Point Register Roles for Binary Floating-Point
TypesRegisterPreservation RulesPurposef0VolatileLocal variables.f1–f13VolatileUsed for parameter passing and return values of binary
float types.f14–f31NonvolatileLocal variables.FPSCRLimited-accessFloating-Point Status and Control Register
limited-access bits. Preservation rules governing the
limited-access bits for the bit fields [VE], [OE], [UE],
[ZE], [XE], and [RN] are presented in
.
DFP SupportThe OpenPOWER ABI supports the decimal floating-point (DFP)
format and DFP language extensions. The default implementation of DFP
types shall be a software implementation of the IEEE DFP standard (IEEE
Standard 754-2008).The Power ISA decimal floating-point category extends the Power
Architecture by adding a decimal floating-point unit. It uses the
existing 64-bit floating-point registers and extends the FPSCR register
to 64 bits, where it defines a decimal rounding-control field in the
extended space. For OpenPOWER, DFP support is defined as an optional
category. When DFP is supported as a vendor-specific implementation
capability, compilers can be used to implement DFP support. The
compilers should provide an option to generate DFP instructions or to
issue calls to DFP emulation software. The DFP parameters are passed in
floating-point registers.As with other implementation-specific features, all
OpenPOWER-compliant programs must be able to execute, functionally
indistinguishably, on hardware with and without vendor-specific
extensions. It is the application's responsibility to transparently
adapt to the absence of vendor-specific features by using a library
responsive to the presence of DFP hardware, or in conjunction with
operating-system dynamic library services, to select from among
multiple DFP libraries that contain either a first software
implementation or a second hardware implementation.Single-precision, double-precision, and quad-precision decimal
floating-point parameters shall be passed in the floating-point
registers. Single-precision decimal floating-point shall occupy the
lower half of a floating-point register. Quad-precision floating-point
values shall occupy an even/odd register pair. When passing
quad-precision decimal floating-point parameters in accordance with
this ABI, an odd floating-point register may be skipped in allocation
order to align quad-precision parameters and results in an even/odd
register pair. When a floating-point register is skipped during input
parameter allocation, words in the corresponding GPR or memory
doubleword in the parameter list are not skipped.
Floating-Point Register Roles for Decimal Floating-Point
TypesRegisterPreservation RulesPurposeFPSCRLimited-accessFloating-Point Status and Control Register
limited-access bits. Preservation rules governing the
limited-access bits for the bit field [DRN] are presented in
.
The OpenPOWER vector-category instruction repertoire provides the
ability to reference 32 vector registers, each 128 bits wide, of the
vector-scalar register file, and a special-purpose register VSCR.
Throughout this document, the symbol vN is used, where N is a register
number, to refer to vector register N.
Vector Register RolesRegisterPreservation RulesPurposev0–v1VolatileLocal variables.v2–v13VolatileUsed for parameter passing and return values.v14–v19VolatileLocal variables.v20–v31NonvolatileLocal variables.VSCRLimited-access32-bit Vector Status and Control Register. Preservation
rules governing the limited-access bits for the bit field
[NJ] are presented in
.
IEEE BINARY 128 EXTENDED
PRECISIONParameters in IEEE BINARY 128 EXTENDED PRECISION format shall be
passed in a single 128-bit vector register as if they were vector
values.IBM EXTENDED
PRECISIONParameters in the IBM EXTENDED PRECISION format with a pair of
two double-precision floating-point values shall be passed in two
successive floating-point registers.If only one value can be passed in a floating-point register, the
second parameter will be passed in a GPR or in memory in accordance
with the parameter passing rules for structure aggregates.Limited-Access BitsThe Power ISA identifies a number of registers that have
mutability limited to the specific bit fields indicated in the
following list:FPSCR [VE]The Floating-Point Invalid Operation Exception Enable
bit [VE] of the FPSCR register.FPSCR [OE]The Floating-Point Overflow Exception Enable bit [OE]
of the FPSCR register.FPSCR [UE]The Floating-Point Underflow Exception Enable bit [UE]
of the FPSCR register.FPSCR [ZE]The Floating-Point Zero Divide Exception Enable bit
[ZE] of the FPSCR register.FPSCR [XE]The Floating-Point Inexact Exception Enable bit [XE] of
the FPSCR register.FPSCR [RN]The Binary Floating-Point Rounding Control field [RN]
of the FPSCR register.FPSCR [DRN]The DFP Rounding Control field [DRN] of the 64-bit
FPSCR register.VSCR [NJ]The Vector Non-Java Mode field [NJ] of the VSCR
register.The bits composing these bit fields are identified as limited
access because this ABI manages how they are to be modified and
preserved across function calls. Limited-access bits may be changed
across function calls only if the called function has specific
permission to do so as indicated by the following conditions. A
function without permission to change the limited-access bits across a
function call shall save the value of the register before modifying the
bits and restore it before returning to its calling function.Limited-Access ConditionsStandard library functions expressly defined to change the state
of limited-access bits are not constrained by nonvolatile preservation
rules; for example, the fesetround( ) and feenableexcept( ) functions.
All other standard library functions shall save the old value of these
bits on entry, change the bits for their purpose, and restore the bits
before returning.Where a standard library function, such as qsort( ), calls
functions provided by an application, the following rules shall be
observed:The limited-access bits, on entry to the first call to such a
callback, must have the values they had on entry to the library
function.The limited-access bits, on entry to a subsequent call to
such a callback, must have the values they had on exit from the
previous call to such a callback.The limited-access bits, on exit from the library function,
must have the values they had on exit from the last call to such a
callback.The compiler can directly generate code that saves and restores
the limited-access bits.The values of the limited-access bits are unspecified on entry
into a signal handler because a library or user function can
temporarily modify the limited-access bits when the signal is taken.
When setjmp( ) returns from its first call (also known as direct
invocation), it does not change the limited access bits. The limited
access bits have the values they had on entry to the setjmp( )
function.When longjmp( ) is performed, it appears to be returning from a
call to setjmp( ). In this instance, the limited access bits are not
restored to the values they had on entry to the setjmp( )
function.C library functions, such as _FPU_SETCW( ) defined in
<fpu_control.h>, may modify the limited-access bits of the FPSCR.
Additional C99 functions that can modify the FPSCR are defined in
<fenv.h>.The vector vec_mtvscr( ) function may change the limited-access NJ
bit.The unwinder does not modify limited-access bits. To avoid the
overhead of saving and restoring the FPSCR on every call, it is only
necessary to save it briefly before the call and to restore it after
any instructions or groups of instructions that need to change its
control flags have been completed. In some cases, that can be avoided
by using instructions that override the FPSCR rounding mode.If an exception and the resulting signal occur while the FPSCR is
temporarily modified, the signal handler cannot rely on the default
control flag settings and must behave as follows:If the signal handler will unwind the stack, print a
traceback, and abort the program, no other special handling is
needed.If the signal handler will adjust some register values (for
example, replace a NaN with a zero or infinity) and then resume
execution, no other special handling is needed. There is one
exception; if the signal handler changed the control flags, it
should restore them.If the signal handler will unwind the stack part way and
resume execution in a user exception handler, the application
should save the FPSCR beforehand and the exception handler should
restore its control flags.The Stack FrameA function shall establish a stack frame if it requires the use of
nonvolatile registers, its local variable usage cannot be optimized into
registers and the protected zone, or it calls another function. For more
information about the protected zone, see
. It need only allocate space
for the required minimal stack frame, consisting of a back-chain
doubleword (optionally containing a back-chain pointer), the saved CR
word, a reserved word, the saved LR doubleword, and the saved TOC pointer
doubleword. shows the relative layout of an
allocated stack frame following a nonleaf function call, where the stack
pointer points to the back-chain word of the caller's stack frame. By
default, the stack pointer always points to the back-chain word of the
most recently allocated stack frame. For more information, see
.In
the white areas indicate an
optional save area of the stack frame. For a description of the optional
save areas described by this ABI, see
.General Stack Frame RequirementsThe following general requirements apply to all stack
frames:The stack shall be quadword aligned.The minimum stack frame size shall be 32 bytes. A minimum
stack frame consists of the first 4 doublewords (back-chain
doubleword, CR save word and reserved word, LR save doubleword, and
TOC pointer doubleword), with padding to meet the 16-byte alignment
requirement.There is no maximum stack frame size defined.Padding shall be added to the Local Variable Space of the
stack frame to maintain the defined stack frame alignment.The stack pointer, r1, shall always point to the lowest
address doubleword of the most recently allocated stack
frame.The stack shall start at high addresses and grow downward
toward lower addresses.The lowest address doubleword (the back-chain word in
) shall point to the
previously allocated stack frame when a back chain is present. As
an exception, the first stack frame shall have a value of 0
(NULL).If required, the stack pointer shall be decremented in the
called function's prologue and restored in the called function's
epilogue..Before a function calls any other functions, it shall save
the value of the LR register into the LR save doubleword of the
caller's stack frame.An optional frame pointer may be created if necessary (for
example, as a result of dynamic allocation on the stack as described
in
to address arguments or local
variables.An example of a minimum stack frame allocation that meets these
requirements is shown in
.Minimum Stack Frame ElementsBack Chain DoublewordWhen a back chain is not present, alternate information
compatible with the ABI unwind framework to unwind a stack must be
provided by the compiler, for all languages, regardless of language
features. A compiler that does not provide such system-compatible
unwind information must generate a back chain. All compilers shall
generate back chain information by default, and default libraries shall
contain a back chain.On systems where system-wide unwind capabilities are not
provided, compilers must not generate object files without back-chain
generation. A system shall provided a programmatic interface to query
unwind information when system-wide unwind capabilities are
provided.CR Save WordIf a function changes the value in any nonvolatile field of the
condition register, it shall first save at least the value of those
nonvolatile fields of the condition register, to restore before
function exit. The caller frame CR Save Word may be used as the save
location. This location in the current frame may be used as temporary
storage, which is volatile over function calls.Reserved WordThis word is reserved for system functions. Modifications of the
value contained in this word are prohibited unless explicitly allowed
by future ABI amendments.LR Save DoublewordIf a function changes the value of the link register, it must
first save the old value to restore before function exit. The caller
frame LR Save Doubleword may be used as the save location. This
location in the current frame may be used as temporary storage, which
is volatile over a function call.TOC Pointer DoublewordIf a function changes the value of the TOC pointer register, it
shall first save it in the TOC pointer doubleword.Optional Save AreasThis ABI provides a stack frame with a number of optional save
areas. These areas are always present, but may be of size 0. This
section indicates the relative position of these save areas in relation
to each other and the primary elements of the stack frame.Because the back-chain word of a stack frame must maintain
quadword alignment, a reserved word is introduced above the CR save
word to provide a quadword-aligned minimal stack frame and align the
doublewords within the fixed stack frame portion at doubleword
boundaries.An optional alignment padding to a quadword-boundary element
might be necessary above the Vector Register Save Area to provide
16-byte alignment, as shown in
.Floating-Point Register Save AreaIf a function changes the value in any nonvolatile floating-point
register fN, it shall first save the value in fN in the Floating-Point
Register Save Area and restore the register upon function exit.If full unwind information such as
DWARF is present, registers can be
saved in arbitrary locations in the stack
frame. If the system floating-point register save and restore
functions are to be used, the floating-point registers
shall be saved in a contiguous range. Floating-point register fN
is saved in the doubleword located 8 × (32 – N) bytes before the back-chain
word of the previous frame, as shown in
The Floating-Point Register Save Area is always doubleword
aligned. The size of the Floating-Point Register Save Area depends upon
the number of floating-point registers that must be saved. If no
floating-point registers are to be saved, the Floating-Point Register
Save Area has a zero size.General-Purpose Register
Save AreaIf a function changes the value in any nonvolatile
general-purpose register rN, it shall first save the value in rN in the
General-Purpose Register Save Area and restore the register upon
function exit.If full unwind information such as DWARF is present, registers can be
saved in arbitrary locations in the stack frame. If the system
general-purpose register save and restore functions are to be used, the
general-purpose registers shall be saved in a contiguous range.
General-purpose register rN is saved in the doubleword located 8 x
(32 – N) bytes before the back-chain word of the previous frame, as shown
in
.The General-Purpose Register Save Area is always doubleword
aligned. The size of the General-Purpose Register Save Area depends
upon the number of general registers that must be saved. If no
general-purpose registers are to be saved, the General-Purpose Register
Save Area has a zero size.Vector Register Save AreaIf a function changes the value in any nonvolatile vector
register vN, it shall first save the value in vN in the Vector Register
Save Area and restore the register upon function exit.If full unwind information such as DWARF is present, registers can be
saved in arbitrary locations in the stack frame. If the system vector
register save and restore functions are to be used, the vector
registers shall be saved in a contiguous range. Vector register vN is
saved in the doubleword located 16 x (32 – N) bytes before the
General-Purpose Register Save Areas plus alignment padding, as shown in
.The Vector Register Save Area is always quadword aligned. If
necessary to ensure suitable alignment of the vector save area, a
padding doubleword may be introduced between the vector register and
General-Purpose Register Save Areas, and/or the Local Variable Space
may be expanded to the next quadword boundary. The size of the Vector
Register Save Area depends upon the number of vector registers that
must be saved. It ranges from 0 bytes to a maximum of 192 bytes (12 X
16). If no vector registers are to be saved, the Vector Register Save
Area has a zero size.Local Variable SpaceThe Local Variable Space is used for allocation of local
variables. The Local Variable Space is located immediately above the
Parameter Save Area, at a higher address. There is no restriction on
the size of this area.Sometimes a register spill area is needed. It is typically
positioned above the Local Variable Space.The Local Variable Space also contains any parameters that need
to be assigned a memory address when the function's parameter list does
not require a save area to be allocated by the caller.Parameter Save
AreaThe Parameter Save Area shall be allocated by the caller for
function calls unless a prototype is provided for the callee indicating
that all parameters can be passed in registers. (This requires a
Parameter Save Area to be created for functions where the number and
type of parameters exceeds the registers available for parameter
passing in registers, for those functions where the prototype contains
an ellipsis to indicate a variadic function, and functions are declared
without prototype.)When the caller allocates the Parameter Save Area, it will always
be automatically quadword aligned because it must always start at SP +
32. It shall be at least 8 doublewords in length. If a function needs
to pass more than 8 doublewords of arguments, the Parameter Save Area
shall be large enough to spill all register-based parameters and to
contain the arguments that the caller stores in it.The calling function cannot expect that the contents of this save
area are valid when returning from the callee.The Parameter Save Area, which is located at a fixed offset of 32
bytes from the stack pointer, is reserved in each stack frame for use
as an argument list when an in-memory argument list is required. For
example, a Parameter Save Area must be allocated by the caller when
calling functions with the following characteristics:Prototyped functions where the parameters cannot be contained
in the parameter registersPrototyped functions with variadic argumentsFunctions without a suitable declaration available to the
caller to determine the called function's characteristics (for
example, functions in C without a prototype in scope, in accordance
with Brian Kernighan and Dennis Ritche,
The C Programming Language, 1st
edition).Under these circumstances, a minimum of 8 doublewords are always
reserved. The size of this area must be sufficient to hold the longest
argument list being passed by the function that owns the stack frame.
Although not all arguments for a particular call are located in
storage, when an in-memory parameter list is required, consider the
parameters to be forming a list in this area. Each argument occupies
one or more doublewords.More arguments might be passed than can be stored in the
parameter registers. In that case, the remaining arguments are stored
in the Parameter Save Area. The values passed on the stack are
identical to the values placed in registers. Therefore, the stack
contains register images for the values that are not placed into
registers.This ABI uses a simple va_list type for variable lists to point
to the memory location of the next parameter. Therefore, regardless of
type, variable arguments must always be in the same location so that
they can be found at runtime. The first 8 doublewords are located in
general registers r3–r10. Any additional doublewords are located in
the stack Parameter Save Area. Alignment requirements such as those for
vector types may require the va_list pointer to first be aligned before
accessing a value.Follow these rules for parameter passing:Map each argument to enough doublewords in the Parameter Save
Area to hold its value.Map single-precision floating-point values to the
least-significant word in a single doubleword.Map double-precision floating-point values to a single
doubleword.Map simple integer types (char, short, int, long, enum) to a
single doubleword. Sign or zero extend values shorter than a
doubleword to a doubleword based on whether the source data type is
signed or unsigned.When 128-bit integer types are passed by value, map each to
two consecutive GPRs, two consecutive doublewords, or a GPR and a
doubleword.In big-endian environments, the most-significant doubleword
of the quadword (__int128) parameter is stored in the lower
numbered GPR or parameter word. The least-significant doubleword
of the quadword (__int128) is stored in the higher numbered GPR
or parameter word. In little-endian environments, the
least-significant doubleword of the quadword (__int128) parameter
is stored in the lower numbered GPR or parameter word. The
most-significant doubleword of the quadword (__int128) is stored
in the higher numbered GPR or parameter word. The required alignment of int128 data types is 16 bytes.
Therefore, by-value parameters must be copied to a new location in
the local variable area of the callee's stack frame before the
address of the type can be provided (for example, using the
address-of operator, or when the variable is to be passed by
reference), when the incoming parameter is not aligned at a 16-byte
boundary.If extended precision floating-point values in IEEE BINARY
128 EXTENDED PRECISION format are supported (see
), map them to a single
quadword, quadword aligned. This might result in skipped
doublewords in the Parameter Save Area.If extended precision floating-point values in IBM EXTENDED
PRECISION format are supported (see
), map them to two
consecutive doublewords. The required alignment of IBM EXTENDED
PRECISION data types is 16 bytes. Therefore, by-value parameters
must be copied to a new location in the local variable area of the
callee's stack frame before the address of the type can be provided
(for example, using the address-of operator, or when the variable
is to be passed by reference), when the incoming parameter is not
aligned at a 16-byte boundary.Map complex floating-point and complex integer types as if
the argument was specified as separate real and imaginary
parts.Map pointers to a single doubleword.Map vectors to a single quadword, quadword aligned. This
might result in skipped doublewords in the Parameter Save
Area.Map fixed-size aggregates and unions passed by value to as
many doublewords of the Parameter Save Area as the value uses in
memory. Align aggregates and unions as follows:
Aggregates that contain qualified floating-point or vector
arguments are normally aligned at the alignment of their base type.
For more information about qualified arguments, see
.Other aggregates are normally aligned in accordance with the
aggregate's defined alignment.The alignment will never be larger than the stack
frame alignment (16 bytes).This might result in
doublewords being skipped for alignment. When a doubleword
in the Parameter Save Area (or its GPR copy) contains at
least a portion of a structure, that doubleword must
contain all other portions mapping to the same doubleword.
(That is, a doubleword can either be completely valid, or
completely invalid, but not partially valid and
invalid, except in the last doubleword where invalid
padding may be present.)Pad an aggregate or union smaller than one doubleword in
size, but having a non-zero size,
so that it is in the
least-significant bits of the doubleword.
Pad all others, if
necessary, at their tail. Variable size aggregates or unions are
passed by reference.Map other scalar values to the number of doublewords required
by their size.Future data types that have an architecturally defined
quadword-required alignment will be aligned at a quadword
boundary.If the callee has a known prototype, arguments are converted
to the type of the corresponding parameter when loaded to their
parameter registers or when being mapped into the Parameter Save
Area. For example, if a long is used as an argument to a float
double parameter, the value is converted to double-precision and
mapped to a doubleword in the Parameter Save Area.Protected ZoneThe 288 bytes below the stack pointer are available as volatile
program storage that is not preserved across function calls. Interrupt
handlers and any other functions that might run without an explicit
call must take care to preserve a protected zone, also referred to as
the red zone, of 512 bytes that consists of:The 288-byte volatile program storage region that is used to
hold saved registers and local variablesAn additional 224 bytes below the volatile program storage
region that is set aside as a volatile system storage region for
system functionsIf a function does not call other functions and does not need
more stack space than is available in the volatile program storage
region (that is, 288 bytes), it does not need to have a stack frame.
The 224-byte volatile system storage region is not available to
compilers for allocation to saved registers and local variables.Parameter Passing in RegistersFor the OpenPOWER Architecture, it is more efficient to pass
arguments to functions in registers rather than through memory. For more
information about passing parameters through memory, see
. For the OpenPOWER ABI, the
following parameters can be passed in registers:Up to eight arguments can be passed in general-purpose
registers r3–r10.Up to thirteen qualified floating-point arguments can be passed
in floating-point registers f1–f13 or up to twelve in vector
registers v2–v13.Up to thirteen single-precision or double-precision decimal
floating-point arguments can be passed in floating-point registers
f1–f13.Up to six quad-precision decimal floating-point arguments can
be passed in even-odd floating-point register pairs f2–f13.Up to 12 qualified vector arguments can be passed in v2–v13.A qualified floating-point argument corresponds to:A scalar floating-point data typeEach member of a complex floating-point typeA member of a homogeneous aggregate of multiple like data types
passed in up to eight floating-point registersA homogeneous aggregate can consist of a variety of nested
constructs including structures, unions, and array members, which shall
be traversed to determine the types and number of members of the base
floating-point type. (A complex floating-point data type is treated as if
two separate scalar values of the base type were passed.)Homogeneous floating-point aggregates can have up to four IBM
EXTENDED PRECISION members, four IEEE BINARY 128 EXTENDED precision
members, four _Decimal128 members, or eight members of other
floating-point types. (Unions are treated as their largest member. For
homogeneous unions, different union alternatives may have different
sizes, provided that all union members are homogeneous with respect to
each other.) They are passed in floating-point registers if parameters of
that type would be passed in floating-point registers. They are passed in
vector registers if parameters of that type would be passed in vector
registers. They are passed as if each member was specified as a separate
parameter.A qualified vector argument corresponds to:A vector data typeA member of a homogeneous aggregate of multiple like data types
passed in up to eight vector registersAny future type requiring 16-byte alignment (see
) or processed in vector
registersFor the purpose of determining a qualified floating-point argument,
_Float128 shall be considered a vector data type. In addition, _Float128
is like a vector data type for determining if multiple aggregate members
are like.A homogeneous aggregate can consist of a variety of nested
constructs including structures, unions, and array members, which shall
be traversed to determine the types and number of members of the base
vector type. Homogeneous vector aggregates with up to eight members are
passed in up to eight vector registers as if each member was specified as
a separate parameter. (Unions are treated as their largest member. For
homogeneous unions, different union alternatives may have different
sizes, provided that all union members are homogeneous with respect to
each other.)Floating-point and vector aggregates that contain padding
words and integer fields with a width of 0 should not be treated as
homogeneous aggregates.A homogeneous aggregate is either a homogeneous floating-point
aggregate or a homogeneous vector aggregate. This ABI does not specify
homogeneous aggregates for integer types.Binary extended precision numbers in IEEE BINARY 128 EXTENDED
PRECISION format (see
) are passed using a VMX
register. Binary extended precision numbers in IBM EXTENDED PRECISION
format (see
) are passed using two
successive floating-point registers. Single-precision decimal
floating-point numbers (see
) are passed in the lower half
of a floating-point register. Quad-precision decimal floating-point
numbers (see
) are passed using a paired
even/odd floating-point register pair. A floating-point register might be
skipped to allocate an even/odd register pair when necessary. When a
floating-point register is skipped, no corresponding memory word is
skipped in the natural home location; that is, the corresponding GPR or
memory doubleword in the parameter list.All other aggregates are passed in consecutive GPRs, in GPRs and in
memory, or in memory.When a parameter is passed in a floating-point or vector register,
a number of GPRs are skipped, in allocation order, commensurate to the
size of the corresponding in-memory representation of the passed
argument's type.The parameter size is always rounded up to the next multiple of a
doubleword.Consequently, each parameter of a non-zero size is allocated to
at least one doubleword.Full doubleword rule:When a doubleword in the Parameter Save Area (or its GPR copy)
contains at least a portion of a structure, that doubleword must contain
all other portions mapping to the same doubleword. (That is, a doubleword
can either be completely valid, or completely invalid, but not partially
valid and invalid, except in the last doubleword where invalid padding
may be present.)IEEE BINARY 128 EXTENDED PRECISIONUp to 12 quad-precision parameters can be passed in v2–v13.
For the purpose of determining qualified floating-point and vector
arguments, an IEEE 128b type shall be considered a "like" vector
type, and a complex _Float128 shall be treated as two individual
scalar elements.IBM EXTENDED PRECISIONIBM EXTENDED PRECISION format parameters are passed as if they
were a struct consisting of separate double parameters.IBM EXTENDED PRECISION format parameters shall be considered as
a distinct type for the determination of homogeneous aggregates.
If fewer arguments are needed, the unused registers defined
previously will contain undefined values on entry to the called
function.If there are more arguments than registers or no function prototype
is provided, a function must provide space for all arguments in its stack
frame. When this happens, only the minimum storage needed to contain all
arguments (including allocating space for parameters passed in registers)
needs to be allocated in the stack frame.General-purpose registers r3–r10 correspond to the allocation of
parameters to the first 8 doublewords of the Parameter Save Areah.
Specifically, this requires a suitable number of general-purpose
registers to be skipped to correspond to parameters passed in
floating-point and vector registers.If a parameter corresponds to an unnamed parameter that corresponds
to the ellipsis, a caller shall promote float values to double. If a
parameter corresponds to an unnamed parameter that corresponds to the
ellipsis, the parameter shall be passed in a GPR or in the Parameter Save
Area.If no function prototype is available, the caller shall promote
float values to double and pass floating-point parameters in both
available floating-point registers and in the Parameter Save Area. If no
function prototype is available, the caller shall pass vector parameters
in both available vector registers and in the Parameter Save Area. (If
the callee expects a float parameter, the result will be
incorrect.)It is the callee's responsibility to allocate storage for the
stored data in the local variable area. When the callee's parameter list
indicates that the caller must allocate the Parameter Save Area (because
at least one parameter must be passed in memory or an ellipsis is present
in the prototype), the callee may use the preallocated Parameter Save
Area to save incoming parameters.Parameter Passing Register Selection AlgorithmThe following algorithm describes where arguments are passed for
the C language. In this algorithm, arguments are assumed to be ordered
from left (first argument) to right. The actual order of evaluation for
arguments is unspecified.gr contains the number of the next available general-purpose
register.fr contains the number of the next available floating-point
register.vr contains the number of the next available vector
register.The following types refer to the type of the argument as
declared by the function prototype. The argument values are converted
(if necessary) to the types of the prototype arguments before passing
them to the called function.If a prototype is not present, or it is a variable argument
prototype and the argument is after the ellipsis, the type refers to
the type of the data objects being passed to the called
function.INITIALIZE: If the function return type requires a storage
buffer, set gr = 4; else set gr = 3.Set fr = 1
Set vr = 2SCAN: If there are no more arguments, terminate. Otherwise,
allocate as follows based on the class of the function
argument:switch(class(argument))
unnamed parameter:
if gr > 10
goto mem_argument
size = size_in_DW(argument)
reg_size = min(size, 11 – gr)
pass (GPR, gr, first_n_DW (argument, reg_size));
if remaining_members
argument = after_n_DW(argument,reg_size))
goto mem_argument
break;
integer: // up to 64b
pointer: // this also includes all pass by reference values
if gr > 10
goto mem_argument
pass (GPR, gr, argument);
gr++
break;
aggregate:
if (homogeneous(argument,float) and regs_needed(members(argument)) <=8)
if (register_type_used (type (argument)) == vr)
goto use_vrs;
n_fregs = n_fregs_for_type(member_type(argument,0))
agg_size = members(argument) * n_fregs
reg_size = min(agg_size, 15 – fr)
pass(FPR,fr,first_n_DW(argument,reg_size)
fr += reg_size;
gr += size_in_DW (first_n_DW(argument,reg_size))
if remaining_members
argument = after_n_DW(argument,reg_size))
goto gpr_struct
break;
if (homogeneous(argument,vector) and members(argument) <= 8)
use_vrs:
agg_size = members(argument)
reg_size = min(agg_size, 14 – vr)
if (gr&1 = 0) // align vector in memory
gr++
pass(VR,vr,first_n_elements(argument,reg_size);
vr += reg_size
gr += size_in_DW (first_n_elements(argument,reg_size)
if remaining_members
argument = after_n_elements(argument,reg_size))
goto gpr_struct
break;
if gr > 10
goto mem_argument
size = size_in_DW(argument)
gpr_struct:
reg_size = min(size, 11 – gr)
pass (GPR, gr, first_n_DW (argument, reg_size));
gr += size_in_DW (first_n_DW (argument, reg_size))
if remaining_members
argument = after_n_DW(argument,reg_size))
goto mem_argument
break;
float:
// float is passed in one FPR.
// double is passed in one FPR.
// IBM EXTENDED PRECISION is passed in the next two FPRs.
// IEEE BINARY 128 EXTENDED PRECISION is passed in one VR.
// _Decimal32 is passed in the lower half of one FPR.
// _Decimal64 is passed in one FPR.
// _Decimal128 is passed in an even-odd FPR pair, skipping an FPR if necessary.
if (register_type_used (type (argument)) == vr)
// Assumes == vr is true for IEEE BINARY 128 EXTENDED PRECISION.
goto use_vr;
fr += align_pad(fr,type(argument))
// Assumes align_pad = 8 for _Decimal128 if fr is odd; otherwise = 0.
if fr > 14
goto mem_argument
n_fregs = n_fregs_for_type(argument)
// Assumes n_fregs_for_type == 2 for IBM EXTENDED PRECISION
// or _Decimal128, == 1 for float, double, _Decimal32 or _Decimal64.
pass(FPR,fr,argument)
fr += n_fregs
gr += size_in_DW(argument)
break;
vector:
Use vr:
if vr > 13
goto mem_argument
if (gr&1 = 0) // align vector in memory
gr++
pass(VR,vr,argument)
vr ++
gr += 2
break;
next argument;
mem_argument:
need_save_area = TRUE
pass (stack, gr, argument)
gr += size_in_DW(argument)
next argument;All complex data types are handled as if two scalar values of the
base type were passed as separate parameters.If the callee takes the address of any of its parameters, values
passed in registers are stored to memory. It is the callee's
responsibility to allocate storage for the stored data in the local
variable area. When the callee's parameter list indicates that the
caller must allocate the Parameter Save Area (because at least one
parameter must be passed in memory, or an ellipsis is present in the
prototype), the callee may use the preallocated Parameter Save Area to
save incoming parameters. (If an ellipsis is present, using the
preallocated Parameter Save Area ensures that all arguments are
contiguous.) If the compilation unit for the caller contains a function
prototype, but the callee has a mismatching definition, this may result
in the wrong values being stored.If the declaration of a function that is used by the caller
does not match the definition for the called function, corruption of
the caller's stack space can occur.Parameter Passing ExamplesThis section provides some examples that use the algorithm
described in
. shows how parameters are
passed for a function that passes arguments in GPRs, FPRs, and
memory.If a prototype is not in scope:The floating-point argument ff is also passed in r4.The long double argument ld is also passed in r6 and
r7.The floating-point argument gg is also passed in
r10.The floating-point argument hh is also stored into the
Parameter Save Area.If a prototype containing an ellipsis describes any of these
floating-point arguments as being part of the variable argument part,
the general registers and Parameter Save Area are used as when no
prototype is in scope. The floating-point registers are not
used. shows the definitions that
are used in the remaining examples of parameter passing. shows how parameters are
passed for a function that passes homogenous floating-point aggregates
and integer parameters in registers without allocating a Parameter Save
Area because all the parameters can be contained in the
registers. shows how parameters are
passed for a function that passes homogenous floating-point aggregates
and integer parameters in registers without allocating a Parameter Save
Area because all parameters can be passed in registers. shows how parameters are
passed for a function that passes floating-point scalars and
homogeneous floating-point aggregates in registers and memory because
the number of available parameter registers has been exceeded. It
demonstrate the full doubleword rule. shows how parameters are
passed for a function that passes homogeneous floating-point aggregates
and floating-point scalars in general-purpose registers because the
number of available floating-point registers has been exceeded. In this
figure, a Parameter Save Area is not allocated because all the
parameters can be passed in registers. shows how parameters are
passed for a function that passes homogeneous floating-point aggregates
in FPRs, GPRs, and memory because the number of available
floating-point and integer parameter registers has been exceeded. In
this figure, a Parameter Save Area is allocated because all the
parameters cannot be passed in the registers. This figure also
demonstrates the full doubleword rule applied to GPR7. shows how parameters are
passed for a function that passes vector data types in VRs, GPRs, and
FPRs. In this figure, a Parameter Save Area is not allocated. shows how parameters are
passed for a function that passes vector data types in VRs, GPRs, and
FPRs. In this figure, a Parameter Save Area is allocated.When a function takes the address of at least one of its
arguments, it is the callee's responsibility to store function
parameters in memory and provide a suitable memory address for
parameters passed in registers.For functions where all parameters can be contained in the
parameter registers and without an ellipsis, the caller shall allocate
saved parameters in the local variable save area because the caller may
not have allocated a Parameter Save Area. This can be performed, for
example, in the prologue. For functions where the caller must allocate
a Parameter Save Area because at least one parameter must be passed in
memory, or has an ellipsis in the prototype to indicate the presence of
a variadic function, references to named parameters may be spilled to
the Parameter Save Area.Variable Argument ListsC programs that are intended to be portable across different
compilers and architectures must use the header file <stdarg.h> to
deal with variable argument lists. This header file contains a set of
macro definitions that define how to step through an argument list. The
implementation of this header file may vary across different
architectures, but the interface is the same.C programs that do not use this header file for the variable
argument list and assume that all the arguments are passed on the stack
in increasing order on the stack are not portable, especially on
architectures that pass some of the arguments in registers. The Power
Architecture is one of the architectures that passes some of the
arguments in registers.The parameter
list may be zero length and is only allocated when parameters are
spilled, when a function has unnamed parameters, or when no prototype is
provided. When the Parameter Save Area is allocated, the Parameter Save
Area must be large enough to accommodate all parameters, including
parameters passed in registers.Return ValuesFunctions that return a value shall place the result in the same
registers as if the return value was the first named input argument to a
function unless the return value is a nonhomogeneous aggregate larger
than 2 doublewords or a homogeneous aggregate with more than eight
registers.For a definition of homogeneous aggregates, see
. (Homogeneous aggregates are arrays, structs, or unions of a
homogeneous floating-point or vector type and of a known fixed size.)
Therefore, IBM EXTENDED PRECISION functions are returned in f1:f2.Homogeneous floating-point or vector aggregate return values that
consist of up to eight registers with up to eight elements will be
returned in floating-point or vector registers that correspond to the
parameter registers that would be used if the return value type were the
first input parameter to a function.Aggregates that are not returned by value are returned in a storage
buffer provided by the caller. The address is provided as a hidden first
input argument in general-purpose register r3.Quadword decimal floating-point return values shall be returned
in the first paired floating-point register parameter pair; that is,
f2:f3.Functions that return values of the following types shall place the
result in register r3 as signed or unsigned integers, as appropriate, and
sign extended or zero extended to 64 bits where necessary:charenumshortintlongpointer to any type_BoolCoding ExamplesThe following ISO C coding examples are provided as illustrations of
how operations may be done, not how they shall be done, for calling
functions, accessing static data, and transferring control from one part of
a program to another. They are shown as code fragments with simplifications
to explain addressing modes. They do not necessarily show the optimal code
sequences or compiler output. The small data area is not used in any of
them. For more information, see
.The previous sections explicitly specify what a program, operating
system, and processor may and may not assume and are the definitive
reference to be used.In these examples, absolute code and position-independent code are
referenced.When instructions hold absolute addresses, a program must be loaded
at a specific virtual address to permit the absolute code model to
work.When instructions hold relative addresses, a program library can be
loaded at various positions in virtual memory and is referred to as a
position-independent code model.Code Model OverviewExecutable modules can be built to use either position-dependent or
position-independent memory references. Position-dependent references
generally result in better performing programs.Static modules representing the base executables and libraries
intended to be statically linked into a base executable can be compiled
and linked using either position-dependent or position-independent
code.Dynamic shared objects (DSOs) intended to be used as shared
libraries and position-independent executables must be compiled and
linked as position-independent code.Position-Dependent CodeStatic objects are preferably built by using position-dependent
code. Position-dependent code can reference data in one of the
following ways:Directly by creating absolute memory addresses using a
combination of instructions such as lis, addi, and memory
instructions:lis r16, symbol@ha
ld r12, symbol@l(r16)
lis r16, symbol2@ha
addi r16, r16, symbol2@l
lvx v1, r0, r16By instantiating the TOC pointer in r2 and using TOC-pointer
relative addressing. (For more information, see
.)<load TOC base to r2>
ld r12, symbol@toc(r2)
li r16, symbol2@toc
lvx v1, r2, r16By instantiating the TOC pointer in r2 and using GOT-indirect
addressing:<load TOC base to r2>
ld r12, symbol@got(r2)
ld r12, 0(r12)
ld r12, symbol2@got(r2)
lvx v1, 0, r12In the OpenPOWER ELF V2 ABI, position-dependent code built with
this addressing scheme may have a Global Offset Table (GOT) in the data
segment that holds addresses. (For more information, see
.) For position-dependent
code, GOT entries are typically updated to reflect the absolute virtual
addresses of the reference objects at static link time. Any remaining
GOT entries are updated by the loader to reflect the absolute virtual
addresses that were assigned for the process. These data segments are
private, while the text segments are shared. In systems based on the
Power Architecture, the GOT can be addressed with a single instruction
if the GOT size is less than 65,536 bytes. A larger GOT requires more
general code to access all of its entries.OpenPOWER-compliant processor hardware implementation and linker
optimizations described here work together to optimize efficient code
generation for applications with large GOTs. They use instruction
fusion to combine multiple ISA instructions into a single internal
operation.Offsets from the TOC register can be generated using
either:16-bit offsets (small code model), with a maximum addressing
reach of 64 KB for TOC-based relative addressing or GOT
accesses32-bit offsets (medium or large code model) with a maximum
addressing reach of 4 GBEfficient implementation of the OpenPOWER ELF V2 ABI medium code
model is supported by additional optimizations present in
OpenPOWER-compliant processor implementations and the OpenPOWER ABI
toolchain (see
).Position-dependent code is most efficient if the application is
loaded in the first 2 GB of the address space because direct address
references and TOC-pointer initializations can be performed using a
two-instruction sequence.Position-Independent CodeA shared object file is mapped with virtual addresses to avoid
conflicts with other segments in the process. Because of this mapping,
shared objects use position-independent code, which means that the
instructions do not contain any absolute addresses. Avoiding the use of
absolute addresses allows shared objects to be loaded into different
virtual address spaces without code modification, which can allow
multiple processes to share the same text segment for a shared object
file.Two techniques are used to deal with position-independent
code:First, branch instructions use an offset to the current
effective address (EA) or use registers to hold addresses. The
Power Architecture provides both EA-relative branch instructions
and branch instructions that use registers. In both cases, absolute
addressing is not required.Second, when absolute addressing is required, the value can be
computed with a Global Offset Table (GOT), which holds the information
for address computation. Static and const references can be
accessed using a TOC pointer relative addressing model, while (shared)
extern references must be accessed using the GOT-indirect addressing
scheme. Both addressing schemes require a TOC pointer to be initialized.
DSOs can access data as follows:By instantiating the TOC pointer in r2 and using TOC pointer
relative addressing (for private data).<load TOC base to r2>
ld r12, symbol@toc(r2)
li r16, symbol2@toc
lvx v1, r2, r16By instantiating the TOC pointer in r2 and using GOT-indirect
addressing (for shared data or for very large data
sections):<load TOC base to r2>
ld r12, symbol@got(r2)
ld r12, 0(r12)
ld r12 symbol2@got(r2)
lvx v1, 0, r12Position-independent executables or shared objects have a GOT in
the data segment that holds addresses. When the system creates a memory
image from the file, the GOT entries are updated to reflect the
absolute virtual addresses that were assigned for the process. These
data segments are private, while the text segments are shared. In
systems based on the Power Architecture, the GOT can be addressed with
a single instruction if the GOT size is less than 65,536 bytes. A
larger GOT requires more general code to access all of its
entries.The OpenPOWER-compliant processor hardware implementation and
linker optimizations described here work together to optimize efficient
code generation for applications with large GOTs. They use instruction
fusion to combine multiple ISA instructions into a single internal
operation.Code ModelsCompilers may provide different code models depending on the
expected size of the TOC and the size of the entire executable or
shared library.Small code model: The TOC is accessed using 16-bit offsets
from the TOC pointer. This limits the size of a single TOC to 64
KB. Position-independent code uses GOT-indirect addressing to
access other objects in the binary.Large code model: The TOC is accessed using 32-bit offsets
from the TOC pointer, except for .sdata and .sbss, which are
accessed using 16-bit offsets from the TOC pointer. This allows a
TOC of at least 2 GB. Position-independent code uses GOT-indirect
addressing to access other objects in the binary.Medium code model: Like the large code model, the TOC is
accessed using 32-bit offsets from the TOC pointer, except for
.sdata and .sbss, which are accessed using 16-bit offsets. In
addition, accesses to module-local code and data objects use TOC
pointer relative addressing with 32-bit offsets. Using TOC pointer
relative addressing removes a level of indirection, resulting in
faster access and a smaller GOT. However. it limits the size of the
entire binary to between 2 GB and 4 GB, depending on the placement
of the TOC base.The medium code model is the default for compilers, and it
is applicable to most programs and libraries. The code examples
in this document generally use the medium code model.When linking medium and large code model relocatable objects, the
linker should place the .sdata and .sbss sections near to the TOC
base.A linker must allow linking of relocatable object files using
different code models. This may be accomplished by sorting the
constituent sections of the TOC so that sections that are accessed
using 16-bit offsets are placed near to the TOC base, by using multiple
TOCs, or by some other method. The suggested allocation order of
sections is provided in
.Function Prologue and EpilogueA function's prologue and epilogue are described in this
section.Function PrologueA function's prologue establishes addressability by initializing
a TOC pointer in register r2, if necessary, and a stack frame, if
necessary, and may save any nonvolatile registers it uses.All functions have a global entry point (GEP) available to any
caller and pointing to the beginning of the prologue. Some functions
may have a secondary entry point to optimize the cost of TOC pointer
management. In particular, functions within a common module sharing the
same TOC base value in r2 may be entered using a secondary entry point
(the local entry point or LEP) that may bypass the code that loads a
suitable TOC pointer value into the r2 register. When a dynamic or
global linker transfers control from a function to another function in
the same module, it
may choose (but is not required) to use the local
entry point when the r2 register is known to hold a valid TOC base
value. Function pointers shared between modules shall always use the
global entry point to specify the address of a function.When a linker causes control to transfer to a global entry point,
it must insert a glue code sequence that loads r12 with the global
entry-point address. Code at the global entry point can assume that
register r12 points to the GEP.Addresses between the global and local entry points must not be
branch targets, either for function entry or referenced by program
logic of the function, because a linker may rewrite the code sequence
establishing addressability to a different, more optimized form.For example, while linking a static module with a known load
address in the first 2 GB of the address space, the following code
sequence may be rewritten:addis r2, r12, .TOC.-func@ha
addi r2, r2, .TOC.-func@lIt may be rewritten by a linker or assembler to an equivalent
form that is faster due to instruction fusion, such as:lis r2, .TOC.@ha
addi r2, r2, .TOC.@lIn addition to establishing addressability, the function prologue
is responsible for the following functions:Creating a stack frame when requiredSaving any nonvolatile registers that are used by the
functionSaving any limited-access bits that are used by the function,
per the rules described in
This ABI shall be used in conjunction with the Power Architecture
that implements the
mfocrf architecture level. Further,
OpenPOWER-compliant processors shall implement implementation-defined
bits in a manner to allow the combination of multiple
mfocrf results with an OR instruction; for example,
to yield a word in r0 including all three preserved CRs as
follows:mfocrf r0, crf2
mfocrf r1, crf3
or r0, r0, r1
mfocrf r1, crf4
or r0, r0, r1Specifically, this allows each OpenPOWER-compliant processor
implementation to set each field to hold either 0 or the correct
in-order value of the corresponding CR field at the point where the
mfocrf instruction is performed.Assembly Language Syntax for Defining Entry
PointsWhen a function has two entry points, the global entry point is
defined as a symbol. The local entry point is defined with the
.localentry assembler pseudo op.my_func:
addis r2, r12, (.TOC.-my_func)@ha
addi r2, r2, (.TOC.-my_func)@l
.localentry my_func, .-my_func
... ; function definition
blr shows how to represent dual
entry points in symbol tables in an ELF object file. It also defines
the meaning of the second parameter, which is put in the three
most-significant bits of the st_other field in the ELF Symbol Table
entry.Function EpilogueThe purpose of the epilogue is to perform the following
functions:Restore all registers and limited-access bits that we saved
by the function's prologue.Restore the last stack frame.Return to the caller.Rules for Prologue and Epilogue SequencesSet function prologue and function epilogue code sequences are
not imposed by this ABI. There are several rules that must be adhered
to in order to ensure reliable and consistent call chain
backtracing:Before a function calls any other function, it shall
establish its own stack frame, whose size shall be a multiple of 16
bytes.In instances where a function's prologue creates a stack
frame, the back-chain word of the stack frame shall be updated
atomically with the value of the stack pointer (r1) when a back
chain is implemented. (This must be supported as default by all ELF
V2 ABI-compliant environments.) This task can be done by using one
of the following Store Doubleword with Update instructions:Store Doubleword with Update instruction with relevant
negative displacement for stack frames that are smaller than 32
KBStore Doubleword with Update Indexed instruction where the
negative size of the stack frame has been computed, using
addis and
addi or
ori instructions, and then loaded into a
volatile register, for stack frames that are 32 KB or
greaterThe function shall save the link register that contains its
return address in the LR save doubleword of its caller's stack
frame before calling another function.The deallocation of a function's stack frame must be an
atomic operation. This task can be accomplished by one of the
following methods:Increment the stack pointer by the identical value that it
was originally decremented by in the prologue when the stack frame
was created.Load the stack pointer (r1) with the value in the back-chain
word in the stack frame, if a back chain is present.The calling sequence does not restrict how languages leverage
the Local Variable Space of the stack frame. There is no
restriction on the size of this section.The Parameter Save Area shall be allocated by the caller. It
shall be large enough to contain the parameters needed by the
caller if a Parameter Save Area is needed (as described in
). Its contents are not
saved across function calls.If any nonvolatile registers are to be used by the function,
the contents of the register must be saved into a register save
area. See
for information on all of
the optional register save areas.Saving or restoring nonvolatile registers used by the function
can be accomplished by using in-line code. Alternately, one of the
system subroutines described in
may offer a more efficient alternative
to in-line code, especially in cases where there are many registers to
be saved or restored.Register Save and Restore FunctionsThis section describes functions that can be used to save and
restore the contents of nonvolatile registers. Using these routines,
rather than performing these saves and restores inline in the prologue
and epilogue of functions, can help reduce the code footprint. The
calling conventions of these functions are not standard, and the
executables or shared objects that use these functions must statically
link them.The register save and restore functions affect consecutive
registers from register N through register 31, where N represents a
number between 14 and 31. Higher-numbered registers are saved at higher
addresses within a save area. Each function described in this section is
a family of functions with identical behavior except for the number and
kind of registers affected.Systems must provide three pairs of functions to save and restore
general-purpose, floating-point, and vector registers. They may be
implemented as multiple-entry-point routines or as individual routines.
The specific calling conventions for each of these functions are
described in
,
, and
. Visibility rules are
described in
.GPR Save and Restore FunctionsEach _savegpr0_N routine saves the general registers from
rN–r31, inclusive. Each routine also saves the LR.
The stack frame must not have been allocated yet. When the routine is
called, r1 contains the address of the word immediately beyond the end
of the general register save area, and r0 must contain the value of the
LR on function entry.The _restgpr0_N routines restore the general registers from
rN–r31, and then return to their caller's caller.
The caller's stack frame must already have been deallocated. When the
routine is called, r1 contains the address of the word immediately
beyond the end of the general register save area, and the LR must
contain the return address.A sample implementation of _savegpr0_N and
_restgpr0_N follows: _savegpr0_14: std r14,-144(r1)
_savegpr0_15: std r15,-136(r1)
_savegpr0_16: std r16,-128(r1)
_savegpr0_17: std r17,-120(r1)
_savegpr0_18: std r18,-112(r1)
_savegpr0_19: std r19,-104(r1)
_savegpr0_20: std r20,-96(r1)
_savegpr0_21: std r21,-88(r1)
_savegpr0_22: std r22,-80(r1)
_savegpr0_23: std r23,-72(r1)
_savegpr0_24: std r24,-64(r1)
_savegpr0_25: std r25,-56(r1)
_savegpr0_26: std r26,-48(r1)
_savegpr0_27: std r27,-40(r1)
_savegpr0_28: std r28,-32(r1)
_savegpr0_29: std r29,-24(r1)
_savegpr0_30: std r30,-16(r1)
_savegpr0_31: std r31,-8(r1)
std r0, 16(r1)
blr
_restgpr0_14: ld r14,-144(r1)
_restgpr0_15: ld r15,-136(r1)
_restgpr0_16: ld r16,-128(r1)
_restgpr0_17: ld r17,-120(r1)
_restgpr0_18: ld r18,-112(r1)
_restgpr0_19: ld r19,-104(r1)
_restgpr0_20: ld r20,-96(r1)
_restgpr0_21: ld r21,-88(r1)
_restgpr0_22: ld r22,-80(r1)
_restgpr0_23: ld r23,-72(r1)
_restgpr0_24: ld r24,-64(r1)
_restgpr0_25: ld r25,-56(r1)
_restgpr0_26: ld r26,-48(r1)
_restgpr0_27: ld r27,-40(r1)
_restgpr0_28: ld r28,-32(r1)
_restgpr0_29: ld r0, 16(r1)
ld r29,-24(r1)
mtlr r0
ld r30,-16(r1)
ld r31,-8(r1)
blr
_restgpr0_30: ld r30,-16(r1)
_restgpr0_31: ld r0, 16(r1)
ld r31,-8(r1)
mtlr r0
blrEach _savegpr1_N routine saves the general registers from
rN–r31, inclusive. When the routine is called, r12 contains the address of
the word just beyond the end of the general register save area.The _restgpr1_N routines restore the general registers from
rN–r31. When the routine is called, r12 contains the address of the word
just beyond the end of the general register save area, superseding the
normal use of r12 on a call.A sample implementation of _savegpr1_N and _restgpr1_N
follows: _savegpr1_14: std r14,-144(r12)
_savegpr1_15: std r15,-136(r12)
_savegpr1_16: std r16,-128(r12)
_savegpr1_17: std r17,-120(r12)
_savegpr1_18: std r18,-112(r12)
_savegpr1_19: std r19,-104(r12)
_savegpr1_20: std r20,-96(r12)
_savegpr1_21: std r21,-88(r12)
_savegpr1_22: std r22,-80(r12)
_savegpr1_23: std r23,-72(r12)
_savegpr1_24: std r24,-64(r12)
_savegpr1_25: std r25,-56(r12)
_savegpr1_26: std r26,-48(r12)
_savegpr1_27: std r27,-40(r12)
_savegpr1_28: std r28,-32(r12)
_savegpr1_29: std r29,-24(r12)
_savegpr1_30: std r30,-16(r12)
_savegpr1_31: std r31,-8(r12)
blr
_restgpr1_14: ld r14,-144(r12)
_restgpr1_15: ld r15,-136(r12)
_restgpr1_16: ld r16,-128(r12)
_restgpr1_17: ld r17,-120(r12)
_restgpr1_18: ld r18,-112(r12)
_restgpr1_19: ld r19,-104(r12)
_restgpr1_20: ld r20,-96(r12)
_restgpr1_21: ld r21,-88(r12)
_restgpr1_22: ld r22,-80(r12)
_restgpr1_23: ld r23,-72(r12)
_restgpr1_24: ld r24,-64(r12)
_restgpr1_25: ld r25,-56(r12)
_restgpr1_26: ld r26,-48(r12)
_restgpr1_27: ld r27,-40(r12)
_restgpr1_28: ld r28,-32(r12)
_restgpr1_29: ld r29,-24(r12)
_restgpr1_30: ld r30,-16(r12)
_restgpr1_31: ld r31,-8(r12)
blrFPR Save and Restore FunctionsEach _savefpr_N routine saves the floating-point registers from
fN–f31, inclusive. When the routine is called, r1
contains the address of the word immediately beyond the end of the
Floating-Point Register Save Area, which means that the stack frame
must not have been allocated yet. Register r0 must contain the value of
the LR on function entry.The _restfpr_N routines restore the floating-point registers
from fN–f31, inclusive. When the routine is called, r1
contains the address of the word immediately beyond the end of the
Floating-Point Register Save Area, which means that the stack frame
must not have been allocated yet.It is incorrect to call both _savefpr_M and _savegpr0_M in the
same prologue, or _restfpr_M and _restgpr0_M in the same epilogue. It
is correct to call _savegpr1_M and _savefpr_M in either order, and to
call _restgpr1_M and then _restfpr_M.A sample implementation of _savefpr_N and
_restfpr_N follows: _savefpr_14: stfd f14,-144(r1)
_savefpr_15: stfd f15,-136(r1)
_savefpr_16: stfd f16,-128(r1)
_savefpr_17: stfd f17,-120(r1)
_savefpr_18: stfd f18,-112(r1)
_savefpr_19: stfd f19,-104(r1)
_savefpr_20: stfd f20,-96(r1)
_savefpr_21: stfd f21,-88(r1)
_savefpr_22: stfd f22,-80(r1)
_savefpr_23: stfd f23,-72(r1)
_savefpr_24: stfd f24,-64(r1)
_savefpr_25: stfd f25,-56(r1)
_savefpr_26: stfd f26,-48(r1)
_savefpr_27: stfd f27,-40(r1)
_savefpr_28: stfd f28,-32(r1)
_savefpr_29: stfd f29,-24(r1)
_savefpr_30: stfd f30,-16(r1)
_savefpr_31: stfd f31,-8(r1)
std r0, 16(r1)
blr
_restfpr_14: lfd f14,-144(r1)
_restfpr_15: lfd f15,-136(r1)
_restfpr_16: lfd f16,-128(r1)
_restfpr_17: lfd f17,-120(r1)
_restfpr_18: lfd f18,-112(r1)
_restfpr_19: lfd f19,-104(r1)
_restfpr_20: lfd f20,-96(r1)
_restfpr_21: lfd f21,-88(r1)
_restfpr_22: lfd f22,-80(r1)
_restfpr_23: lfd f23,-72(r1)
_restfpr_24: lfd f24,-64(r1)
_restfpr_25: lfd f25,-56(r1)
_restfpr_26: lfd f26,-48(r1)
_restfpr_27: lfd f27,-40(r1)
_restfpr_28: lfd f28,-32(r1)
_restfpr_29: ld r0, 16(r1)
lfd f29,-24(r1)
mtlr r0
lfd f30,-16(r1)
lfd f31,-8(r1)
blr
_restfpr_30: lfd f30,-16(r1)
_restfpr_31: ld r0, 16(r1)
lfd f31,-8(r1)
mtlr r0
blrVector Save and Restore FunctionsEach _savevr_M routine saves the vector registers from vM–v31
inclusive.
On entry to
this function, r0 contains the address of the word just beyond the end
of the Vector Register Save Area. The routines leave r0 undisturbed.
They modify the value of r12.The _restvr_M routines restore the vector registers from vM–v31
inclusive. On entry to this function, r0 contains the address of the
word just beyond the end of the Vector Register Save Area. The routines
leave r0 undisturbed. They modify the value of r12. The following code
is an example of restoring a vector register.It is valid to call _savevr_M before any of the other register
save functions, or after _savegpr1_M. It is valid to call _restvr_M
before any of the other register restore functions, or after
_restgpr1_M.A sample implementation of _savevr_M and _restvr_M
follows:
_savevr_20: addi r12,r0,-192
stvx v20,r12,r0 # save v20
_savevr_21: addi r12,r0,-176
stvx v21,r12,r0 # save v21
_savevr_22: addi r12,r0,-160
stvx v22,r12,r0 # save v22
_savevr_23: addi r12,r0,-144
stvx v23,r12,r0 # save v23
_savevr_24: addi r12,r0,-128
stvx v24,r12,r0 # save v24
_savevr_25: addi r12,r0,-112
stvx v25,r12,r0 # save v25
_savevr_26: addi r12,r0,-96
stvx v26,r12,r0 # save v26
_savevr_27: addi r12,r0,-80
stvx v27,r12,r0 # save v27
_savevr_28: addi r12,r0,-64
stvx v28,r12,r0 # save v28
_savevr_29: addi r12,r0,-48
stvx v29,r12,r0 # save v29
_savevr_30: addi r12,r0,-32
stvx v30,r12,r0 # save v30
_savevr_31: addi r12,r0,-16
stvx v31,r12,r0 # save v31
blr # return to epilogue
_restvr_20: addi r12,r0,-192
lvx v20,r12,r0 # restore v20
_restvr_21: addi r12,r0,-176
lvx v21,r12,r0 # restore v21
_restvr_22: addi r12,r0,-160
lvx v22,r12,r0 # restore v22
_restvr_23: addi r12,r0,-144
lvx v23,r12,r0 # restore v23
_restvr_24: addi r12,r0,-128
lvx v24,r12,r0 # restore v24
_restvr_25: addi r12,r0,-112
lvx v25,r12,r0 # restore v25
_restvr_26: addi r12,r0,-96
lvx v26,r12,r0 # restore v26
_restvr_27: addi r12,r0,-80
lvx v27,r12,r0 # restore v27
_restvr_28: addi r12,r0,-64
lvx v28,r12,r0 # restore v28
_restvr_29: addi r12,r0,-48
lvx v29,r12,r0 # restore v29
_restvr_30: addi r12,r0,-32
lvx v30,r12,r0 # restore v30
_restvr_31: addi r12,r0,-16
lvx v31,r12,r0 # restore v31
blr #return to epilogueFunction PointersA function's address is defined to be its global entry point.
Function pointers shall contain the global entry-point address.Static Data ObjectsData objects with static storage duration are described here.
Stack-resident data objects are omitted because the virtual addresses of
stack-resident data objects are derived relative to the stack or frame
pointers. Heap data objects are omitted because they are accessed via a
program pointer.The only instructions that can access memory in the Power
Architecture are load and store instructions. Programs typically access
memory by placing the address of the memory location into a register and
accessing the memory location indirectly through the register because
Power Architecture instructions cannot hold 64-bit addresses directly.
The values of symbols or their absolute virtual addresses are placed
directly into instructions for symbolic references in absolute
code. shows an example of this
method.Examples of absolute and position-independent compilations are
shown in
,
, and
. These examples show the C
language statements together with the generated assembly language. The
assumption for these figures is that only executables can use absolute
addressing while shared objects must use position-independent code
addressing. The figures are intended to demonstrate the compilation of
each C statement independent of its context; hence, there can be
redundant operations in the code.Absolute addressing efficiency depends on the memory-region
addresses:Top 32 KBAddressed directly with load and store D forms.Top 2 GBAddressed by a two-instruction sequence consisting of an
lis with load and store D forms.Remaining addressesMore than two instructions.Bottom 2 GBAddressed by a two-instruction sequence consisting of an
lis with load and store D forms.Bottom 32 KBAddressed directly with load and store D forms.
Absolute Load and Store ExampleC CodeAssembly Codeextern int src;
extern int dst;
extern int *ptr;
dst = src;
ptr = &dst;
*ptr = src;
.extern src
.extern dst
.extern ptr
.section ".text"
lis r9,src@ha
lwz r9,src@l(r9)
lis r11,dst@ha
stw r9,dst@l(r11)
lis r11,ptr@ha
lis r9,dst@ha
la r9,dst@l(r9)
std r9,ptr@l(r11)
lis r11,ptr@ha
lwz r11,ptr@l(r11)
lis r9,src@ha
lwz r9,src@l(r9)
stw r9,0(r11)
Small Model Position-Independent Load and Store (DSO)C CodeAssembly Codeextern int src;
extern int dst;
extern int *ptr;
dst = src;
ptr = &dst;
*ptr = src;
.extern src
.extern dst
.extern ptr
.section ".text"
# TOC base in r2
ld r9,src@got(2)
lwz r0,0(r9)
ld r9,dst@got(r2)
stw r0,0(r9)
ld r9,ptr@got(r2)
ld r0,dst@got(r2)
std r0,0(r9)
ld r9,ptr@got(r2)
ld r11,0(r9)
ld r9,src@got(r2)
lwz r0,0(r9)
stw r0,0(r11)
Medium or Large Model Position-Independent Load and Store
(DSO)C CodeAssembly Codeextern int src;
extern int dst;
int *ptr;
dst = src;
ptr = &dst;
*ptr = src;
.extern src
.extern dst
.extern ptr
.section".text"
# AssumesTOC pointer in r2
addis r6,r2,src@got@ha
ld r6,src@got@l(r6)
addis r7,r2,dst@got@ha
ld r7,dst@got@l(r7)
lwz r0,0(r6)
stw r0,0(r7)
addis r6,r2,dst@got@ha
ld r6,dst@got@l(r6)
addis r7,r2,ptr@got@ha
ld r7,ptr@got@l(r7)
stw r6,0(r7)
addis r6,r2,src@got@ha
ld r6,src@got@l(r6)
addis r7,r2,ptr@got@ha
ld r7,ptr@got@l(r7)
ld r7,0(r7)
lwz r0,0(r6)
stw r0,0,(r7)
Due to fusion hardware support, the preferred code forms are
destructiveDestructive in this context refers to a code sequence where
the first intermediate result computed by a first instruction is
overwritten (that is, "destroyed") by the result of a second
instruction so that only one result register is produced. Fusion
can then give the same performance as a single load instruction
with a 32-bit displacement. addressing forms with an addis specifying a set of
high-order bits followed immediately by a destructive load using
the same target register as the addis instruction to load data from
a signed 32-bit offset from a base register.For a PIC code (see
and
), the offset in the
Global Offset Table where the value of the symbol is stored is
given by the assembly syntax symbol@got. This syntax represents the
address of the variable named "symbol."The offset for this assembly syntax cannot be any larger than 16
bits. In cases where the offset is greater than 16 bits, the following
assembly syntax is used for offsets up to 32 bits:High (32-bit) adjusted part of the offset:
symbol@got@haCauses a linker error if the offset is larger than 32
bits.High (32-bit) part of the offset: symbol@got@hCauses a linker error if the offset is larger than 32
bits.Low part of the offset: symbol@got@lTo obtain the multiple 16-bit segments of a 64-bit offset, the
following operators may be used:Highest (most-significant 16 bits) adjusted part of the
offset: symbol@highestaHighest (most-significant 16 bits) part of the offset:
symbol@highestHigher (next significant 16 bits) adjusted part of the
offset: symbol@higheraHigher (next significant 16 bits) part of the offset:
symbol@higherHigh (next significant 16 bits) adjusted part of the offset:
symbol@highaHigh (next significant 16 bits) part of the offset:
symbol@highLow part of the offset: symbol@lIf the instruction using symbol@got@l has a signed immediate operand (for example,
addi), use symbol@got@ha(high adjusted) for the high part of the offset.
If it has an unsigned immediate operand (for example, ori), use
symbol@got@h. For a description of high-adjusted values, see
.Function CallsDirect function calls are made in programs with the Power
Architecture bl instruction. A bl instruction can reach 32 MB backwards
or forwards from the current position due to a self-relative branch
displacement in the instruction. Therefore, the size of the text segment
in an executable or shared object is constrained when a bl instruction is
used to make a function call. When the distance of the called function
exceeds the displacement reach of the bl instruction, a linker
implementation may either introduce branch trampoline code to extend
function call distances or issue a link error.As shown in
, the bl instruction is
generally used to call a local function.Two possibilities exist for the location of the function with
respect to the caller:The called function is in the same executable or shared object
as the caller. In this case, the symbol is resolved by the link
editor and the bl instruction branches directly to the called
function as shown in
.The called function is not in the same executable or shared
object as the caller. In this case, the symbol cannot be directly
resolved by the link editor. The link editor generates a branch to
glue code that loads the address of the function from the Procedure
Linkage Table. See
.For indirect function calls, the address of the function to be
called is placed in r12 and the CTR register. A bctrl instruction is used
to perform the indirect branch as shown in
, and
. The ELF V2 ABI requires the
address of the called function to be in r12 when a cross-module function
call is made. shows how to make an indirect
function call using small-model position-independent code. shows how to make an indirect
function call using large-model position-independent code.Function calls need to be performed in conjunction with
establishing, maintaining, and restoring addressability through the TOC
pointer register, r2. When a function is called, the TOC pointer register
may be modified. The caller must provide a nop after the bl instruction
performing a call, if r2 is not known to have the same value in the
callee. This is generally true for external calls. The linker will
replace the nop with an r2 restoring instruction if the caller and callee
use different r2 values, The linker leaves it unchanged if they use the
same r2 value. This scheme avoids having a compiler generate an
overconservative r2 save and restore around every external call.For calls to functions resolved at runtime, the linker must
generate stub code to load the function address from the PLT.The stub code also must save r2 to 24(r1) unless the call is marked
with an R_PPC64_TOCSAVE relocation that points to a nop provided in the
caller's prologue. In that case, the stub code can omit the r2 save.
Instead, the linker replaces the prologue nop with an r2 save.tocsaveloc:
nop
...
bl target
.reloc ., R_PPC64_TOCSAVE, tocsaveloc
nopThe linker may assume that r2 is valid at the point of a call.
Thus, stub code may use r2 to load an address from the PLT unless the
call is marked with an R_PPC64_REL24_NOTOC relocation to indicate that r2
is not available.The nop instruction must be:ori r0,r0,0For more information, see
,
, and
.BranchingThe flow of execution in a program is controlled by the use of
branch instructions. Unconditional branch instructions can jump to
locations up to 32 MB in either direction because they hold a signed
value with a 64 MB range that is relative to the current location of the
program execution. shows the model for branch
instructions.Selecting one of multiple branches is accomplished in C with switch
statements. An address table is used by the compiler to implement the
switch statement selections in cases where the case labels satisfy
grouping constraints. In the examples that follow, details that are not
relevant are avoided by the use of the following simplifying
assumptions:r12 holds the selection expression.Case label constants begin at zero.The assembler names .Lcasei, .Ldefault, and .Ltab are used for
the case labels, the default, and the address table
respectively.For position-dependent code (for example, the main module of an
application) loaded into the low or high address range, absolute
addressing of a branch table yields the best performance.A faster variant of this code may be used to locate branch
targets in the bottom 2 GB of the address space in conjunction with the
lwz instruction in place of the lwa instruction.For position-independent code targeted at being dynamically loaded
to different address ranges as DSO, the preferred code pattern uses
TOC-relative addressing by taking advantage of the fact that the TOC
pointer points to a fixed offset from the code segment. The use of
relative offsets from the start address of the branch table ensures
position-independence when code is loaded at different addresses.For position-independent code targeted at being dynamically loaded
to different address ranges as a DSO or a position-independent executable
(PIE), the preferred code pattern uses TOC-indirect addresses for code
models where the distance between the TOC and the branch table exceeds 2
GB. The use of relative offsets from the start address of the branch
table ensures position independence when code is loaded at different
addresses. shows how, in the medium code
model, PIC code can be used to avoid using the lwa instruction, which may
result in lower performance in some POWER processor
implementations.Dynamic Stack Space AllocationWhen allocated, a stack frame may be grown or shrunk dynamically as
many times as necessary across the lifetime of a function. Standard
calling conventions must be maintained because a subfunction can be
called after the current frame is grown and that subfunction may stack,
grow, shrink, and tear down a frame between dynamic stack frame
allocations of the caller. The following constraints apply when
dynamically growing or shrinking a stack frame:Maintain 16-byte alignment.Stack pointer adjustments shall be performed atomically so that
at all times the value of the back-chain word is valid, when a back
chain is used.Maintain addressability to the previously allocated local
variables in the presence of multiple dynamic allocations or
conditional allocations.Ensure that other linkage information is correct, so that the
function can return or its stack space can be deallocated by
exception handling without deallocating any dynamically allocated
space.Using a frame pointer is the recognized method for
maintaining addressability to arguments or local variables. (This may
be a pointer to the top of the stack frame, typically in r31.) For
correct behavior in the cases of setjmp( ) and longjmp( ), the frame
pointer shall be allocated in a nonvolatile general-purpose
register. shows the organization of a
stack frame before a dynamic allocation.Because it is allowed (and common) to return without first
deallocating this dynamically allocated memory, all the linkage
information in the new location must be valid. Therefore, it is also
necessary to copy the CR save word and the TOC pointer doubleword from
their old locations to the new. It is not necessary to copy the LR save
doubleword because, until this function makes a call, it does not contain
a value that needs to be preserved. In the future, if it is defined and
if the function uses the Reserved word, the LR save doubleword must also
be copied.Additional instructions will be necessary for an allocation of
variable size. If a dynamic deallocation will occur, the r1 stack
pointer must be saved before the dynamic allocation, and r1 reset to
that by the deallocation. The deallocation does not need to copy any
stack locations because the old ones should still be valid. shows an example organization
of a stack frame after a dynamic allocation.DWARF DefinitionAlthough this ABI itself does not define a debugging format, debug
with arbitrary record format (DWARF) is defined here for systems that
implement the DWARF specification. For information about how to locate the
specification, see
.The DWARF specification is used by compilers and debuggers to aid
source-level or symbolic debugging. However, the format is not biased toward
any particular compiler or debugger. Per the DWARF specification, a
mapping from Power Architecture registers to register numbers is required as
described in .All instances of the Power Architecture use the mapping shown in
for encoding registers into
DWARF. DWARF register numbers 32–63 and 77–108 are also used to
indicate the location of variables in VSX registers vsr0–vsr31 and
vsr32–vsr63, respectively, in DWARF debug information.
Mappings of Common RegistersDWARFRegister NumberRegister NameRegister Width (Bytes)Reg0–31r0–r318Reg32–63f0–f318Reg64ReservedN/AReg65lr8Reg66ctr8Reg67ReservedN/AReg68–75cr0–cr70.5The CRx registers correspond to 4-bit fields within a
word where the offset of the 4-bit group within a word is a
function of the CRFx number (x).Reg76xer4Reg77–108vr0–vr3116Reg109ReservedN/AReg110vscr8Reg111ReservedN/AReg112ReservedN/AReg113ReservedN/AReg114tfhar8Reg115tfiar8Reg116texasr8
DWARF for the OpenPOWER ABI defines the address class codes described
in
.
Address Class CodesCodeValueMeaningADDR_none0No class specified
Exception HandlingWhere exceptions can be thrown or caught by a function, or thrown
through that function, or where a thread can be canceled from within a
function, the locations where nonvolatile registers have been saved must be
described with unwind information. The format of this information is based
on the DWARF call frame information with extensions.Any implementation that generates unwind information must also
provide exception handling functions that are the same as those described
in the Itanium C++ ABI, the normative text on the issue. For information
about how to locate this material, see
.