Call Function Definition

Call Function Definition This section specifies the semantics of all the RTAS calls. It specifies the RTAS function name, the contents of its argument call buffer (its token, input parameters, and output values) and semantics.

NVRAM Access Functions This architecture requires an area of non-volatile memory (NVRAM) to hold OF options, RTAS information, machine configuration state, OS state, diagnostic logs, etc. The type and size of NVRAM is specified in the OF device tree. The format of NVRAM is detailed in . In order to give the OS the ability to access NVRAM on different platforms that may use different implementations or locations for NVRAM, a layer of abstraction is provided to the OS. The functions in this section provide an interface for reading and writing NVRAM with byte level operations with no boundary requirements.

<emphasis>nvram-fetch</emphasis> The RTAS function nvram-fetch copies data from a given offset in NVRAM into the user specified buffer. R1--1. RTAS must implement an nvram-fetch function that returns data from NVRAM using the argument call buffer defined by . Argument Call Buffer <emphasis>nvram-fetch</emphasis> Parameter Type Name Values In Token Token for nvram-fetch Number Inputs 3 Number Outputs 2 Index Byte offset in NVRAM Buffer Real address of data buffer Length Size of data buffer (in bytes) Out Status 0: Success -1: Hardware Error -3: Parameter out of range Num Number of bytes successfully copied

<emphasis>nvram-store</emphasis> The RTAS function nvram-store copies data from the user specified buffer to a given offset in NVRAM. R1--1. RTAS must implement an nvram-store function that stores data in NVRAM using the argument call buffer defined by . Argument Call Buffer <emphasis>nvram-store</emphasis> Parameter Type Name Values In Token Token for nvram-store Number Inputs 3 Number Outputs 2 Index Byte number in NVRAM Buffer Real address of data buffer Length Size of data buffer (in bytes) Out Status 0: Success -1: Hardware Error -3: Parameter out of range Num Number of bytes successfully copied

R1--2. If the nvram-store operation succeeded, the contents of NVRAM must have been updated to the user specified values. The contents of NVRAM are undefined if the RTAS call failed. Platform Implementation Note: The platform may keep the NVRAM data cached in volatile memory as long as the cache is implemented as a store-through cache and not a store-in cache. That is, changed data is written to NVRAM as soon as possible. Return from the nvram-store call with a “success” Status is permissible after placing the data into a store-through cache and prior to the actual writing to the NVRAM. R1--3. The caller of the nvram-store RTAS call must maintain the NVRAM partitions as specified in .

Time of Day The minimum system requirements include a non-volatile real time clock which maintains the time of day even if power to the machine is removed. Minimum requirements for this clock are described in Requirement .

Time of Day Inputs/Outputs The OS maintains the clock in UTC. This allows the OS and diagnostics to co-exist with each other and provide uniform handling of time. R1--1. The date and time inputs and outputs to the RTAS time of day function calls are specified with the year as the actual value (for example, 1995), the month as a value in the range 1-12, the day as a value in the range 1-31, the hour as a value in the range 0-23, the minute as a value in the range 0-59, and the second as a value in the range 0-59. The date must also be a valid date according to common usage: the day range being restricted for certain months, month 2 having 29 days in leap years, etc. R1--2. OSs must account for local time, for daylight savings time when and where appropriate, and for leap seconds. R1--3. RTAS must account for leap years.

<emphasis>get-time-of-day</emphasis> R1--1. RTAS must implement a get-time-of-day call using the argument call buffer defined by . Argument Call Buffer <emphasis>get-time-of-day</emphasis> Parameter Type Name Values In Token Token for get-time-of-day Number Inputs 0 Number Outputs 8 Out Status 990x: Extended Delay where x is a number 0-5 (see text below) 0: Success -1: Hardware Error -2: Clock Busy, Try again later Year Year Month 1-12 Day 1-31 Hour 0-23 Minute 0-59 Second 0-59 Nanoseconds 0-999999999

Software Implementation Note: When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling again. However, software may issue the call again either earlier or later than this. R1--2. RTAS must read the current time and set the output values to the best resolution provided by the platform.

<emphasis>set-time-of-day</emphasis> R1--1. RTAS must implement a set-time-of-day call using the argument call buffer defined by . Argument Call Buffer <emphasis>set-time-of-day</emphasis> Parameter Type Name Values In Token Token for set-time-of-day Number Inputs 7 Number Outputs 1 Year Year Month 1-12 Day 1-31 Hour 0-23 Minute 0-59 Second 0-59 Nanosecond 0-999999999 Out Status 990x: Extended Delay where x is a number 0-5 (see text below) 0: Success -1: Hardware Error -3: Parameter Error

Software Implementation Note: When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling again. However, software may issue the call again either earlier or later than this. R1--2. RTAS must set the time of day to the best resolution provided by the platform. R1--3. RTAS must return a Status of -3 (Parameter Error) to the set-time-of-day RTAS call when the specified date is outside the range supported by the platform. Software Implementation Note: The OS maintains the clock in UTC. This allows the OS and diagnostics to co-exist with each other and provide uniform handling of time. Refer to Requirement for further details on the time of day clock.

<emphasis>set-time-for-power-on</emphasis> Some platforms provide the ability to set a time to cause the platform power on. The set-time-for-power-on call provides the interface to the OS for setting this timer. R1--1. RTAS must implement the set-time-for-power-on call using the argument call buffer defined by . Argument Call Buffer <emphasis>set-time-for-power-on</emphasis> Parameter Type Name Values In Token Token for set-time-for-power-on Number Inputs 7 Number Outputs 1 Year Year Month 1-12 Day 1-31 Hour 0-23 Minute 0-59 Second 0-59 Nanosecond 0-999999999 Out Status 990x: Extended Delay where x is a number 0-5 (see text below) 0: Success -1: Hardware Error -2: Clock Busy, Try again later -3: Parameter Error

Software Implementation Note: When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling again. However, software may issue the call again either earlier or later than this. R1--2. Hardware must support power on times of up to four weeks into the future, at a minimum. R1--3. RTAS must schedule the time for power on as close as it can approach to the desired time. Software Implementation Note: Hardware limitations on the duration of the power-on timer may result in power-on sooner than requested by software. If present in the /rtas node, the OF property “power-on-max-latency” gives in days the maximum power-on duration capability of the hardware. If the property is not present, software should expect the default of a maximum of 28 days. A “day” is defined as 24 hour increments from the current time. R1--4. If the system is in a powered down state at the time scheduled by set-time-for-power-on (within the accuracy of the clock), then power must be reapplied and the system must go through its power on sequence. R1--5. RTAS must return a Status of -3 (Parameter Error) to the set-time-for-power-on RTAS call when the specified date is outside the range supported by the platform (such as before current TOD).

Error and Event Reporting The error and event reporting RTAS calls are designed to provide an abstract interface into hardware registers in the system that may contain correctable or non-correctable errors and to provide an abstract interface to certain platform events that may be of interest to the OS. Such errors and events may be detected either by a periodic scan or by an exception trap. These functions are not intended to replace the normal error handling in the OS. Rather, they enhance the OS’s abilities by providing an abstract interface to check for, report, and recover from errors or events on the platform that are not necessarily known to the OS. The OS uses the error and event RTAS calls in two distinct ways: Periodically, the OS calls event-scan to have the system firmware check for any errors or events that have occurred. Whenever the OS receives an interrupt or exception that it cannot fully process, it calls check-exception.. The first case covers all errors and events that do not signal their occurrence with an interrupt or exception. The second case covers those errors and events that do signal with an interrupt or exception. It is platform dependent whether any specific error or event causes an interrupt on that platform. R1--1. RTAS must return the event generated by a particular interrupt or event source by either check-exception or event-scan, but not both. R1--2. check-exception and event-scan, on a 64-bit capable platform, must be able to handle platform resources that are accessed using 64-bit addresses when instantiated in 32-bit mode.

<emphasis>event-scan</emphasis> R1--1. RTAS must implement an event-scan call using the argument call buffer defined by . Argument Call Buffer <emphasis>event-scan</emphasis> Parameter Type Name Values In Token Token for event-scan Number Inputs 4 Number Outputs 1 Event Mask Mask of event classes to process Critical Indicates whether this call is required to complete quickly Buffer Real address of error log Length Length of error log buffer Out Status 1: No Errors Found 0: New Error Log returned -1: Hardware Error

R1--2. The event-scan call must fill in the error log with a single error log formatted as specified in . If necessary, the data placed into the error log must be truncated to length bytes. R1--3. RTAS must only check for errors or events that are within the classes defined by the Event mask. Event mask is a bit mask of error and event classes. Refer to for the definition of the bit positions. R1--4. If Critical is non-zero, then RTAS must perform only those operations that are required for continued operation. No extended error information is returned. R1--5. The event-scan call must return the first found error or event and clear that error or event so it is only reported once. R1--6. The OS must continue to call event-scan while a Status of “New Error Log returned” is returned. R1--7. The event-scan call must be made at least “rtas-event-scan-rate” times per minute for each error and event class and must have the Critical parameter equal to 0 for this periodic call. R1--8. The platform must not return more than two error logs during the first sequence of event-scan RTAS calls after boot of an OS image, and must not return more than one error log to that OS image during any sequence of event-scan RTAS calls after the first time a non-zero Status is returned. Software Implementation Notes: In a multiprocessor system, each processor should call event-scan periodically, not always the same one. The event-scan function needs to be called a total of “rtas-event-scan-rate” times a minute. The maximum size of the error log is specified in the OF device tree as the “rtas-error-log-max” property of the /rtas node. This call does not log the error in NVRAM. It returns the error log to the OS. It is the responsibility of the OS to take appropriate action. For best system performance, the requested “rtas-event-scan-rate” should be as low as possible, and as a goal should not exceed 120 scans per minute. Maximum system performance is obtained when no scans are required.

<emphasis>check-exception</emphasis> R1--1. RTAS must implement a check-exception call using the argument call buffer defined by . Argument Call Buffer <emphasis>check-exception</emphasis> Parameter Type Name Values In Token Token for check-exception Number Inputs 6 (without Extended Information) 7 (with Extended Information) Number Outputs 1 Vector Offset The vector offset for the exception. See . Additional Information Information which RTAS may need to determine the cause of the exception, but which may be unavailable to it in hardware registers. See for details. Event Mask Mask of event classes to process Critical Indicates whether this call is required to complete quickly Buffer Real address of error log Length Length of error log Extended Information See Requirement . Out Status 1: No Errors Found 0: New Error Log returned -1: Hardware Error

R1--2. The OS must provide the value specified in in the Additional Information parameter in the call to check-exception, with the Number Inputs parameter set to 6. If the value (e.g., SRR1) is too large to fit in this cell, the lower 32-bits must be provided here, the upper 32-bits provided in the Extended Information parameter, and the Number Inputs parameter set to 7. Additional Information Provided to <emphasis>check-exception</emphasis> call Source of Interrupt Value of “Additional Information” Variable External Interrupt Interrupt number Machine check exception Value of register SRR1 at entry to machine check handler System Reset exception Value of register SRR1 at entry to system reset handler Other exception Value of register SRR1 at entry to exception handler

R1--3. The check-exception call must fill in the error log with a single error log formatted as specified in . The data in the error log must be truncated to length bytes. R1--4. If Critical is non-zero, then RTAS must perform only those operations that are required for continued operation. No extended error information is returned. R1--5. The check-exception call must return the first found error or event and clear that error or event so it is only reported once. R1--6. RTAS must only check for errors or events that are within the classes defined by the Event mask. Event mask is a bit mask of error and event classes. Refer to for the definition of the bit positions. Software Implementation Notes: All OS reserved exception handlers should call check-exception to process any errors that are unknown to the OS. The interrupt number for external device interrupts is provided in the OF device tree as specified in . Software, with knowledge of the class of event it seeks, matches the data in the Vector Offset, Additional Information, and Extended Information with the Event Mask such that ambiguity does not result.

<emphasis>rtas-last-error</emphasis> R1--1. RTAS must implement an rtas-last-error call using the argument call buffer defined in . Argument Call Buffer <emphasis>rtas-last-error</emphasis> Parameter Type Name Values In Token Token for rtas-last-error Number Inputs 2 Number Outputs 1 Buffer Real address of error log Length Length of error log buffer Out Status 1: No Errors Found 0: New Error Log Returned -1: Hardware Error (cannot create log)

R1--2. The rtas-last-error call must fill in the error log with a single error log formatted as specified in . If necessary, the data placed into the error log must be truncated to ‘length” bytes. R1--3. RTAS must only check for hardware errors that occurred during a prior call to some other RTAS function, resulting in a -1 (Hardware Error) return Status. Software Note: This function is intended to provide the OS with more detailed failure information after an RTAS call returns with a -1 (Hardware Error) Status, and should not be called except for this purpose. If rtas-last-error itself returns a -1 Status, then it could not create the error log data because of a further error, and the OS should not try to call it again.

Platform Dump Option The architectural intent of the Platform Dump option is to allow a mechanism for the platform to communicate a variety of dump data used to debug problems within the platform firmware or hardware.

<emphasis>ibm,platform-dump</emphasis> This RTAS call is used to transfer dump data from the platform to the OS. It is expected that this routine will have to be called several times to complete the transfer of the diagnostic dump data. It is also anticipated that multiple dumps could be in the process of completion at the same time. Individual dumps are identified by a dump tag passed by the OS. The OS may interleave calls to ibm,platform-dump with different RTAS calls. Other standard RTAS locking rules apply (for example, only one processor may call RTAS at a time). The OS only makes the ibm,platform-dump RTAS call when an event scan returns an error log with an Event Type of “Dump Notification” as described in Version 6 or later of the RTAS General Extended Error Log Format. R1--1. For the Platform Dump option: The RTAS function ibm,platform-dump must be implemented and must implement the argument call buffer as defined by . Argument Call Buffer <emphasis>ibm,platform-dump</emphasis> Parameter Type Name Values In Token Token for ibm,platform-dump Number of Inputs 6 Number of Outputs 5 Dump_Tag_Hi Most-significant 32 bits of a Dump_Tag representing an id of the dump being processed Dump_Tag_Lo Least-significant 32 bits of a Dump_Tag representing an id of the dump being processed Sequence_Hi Most-significant 32 bits of the Sequence, a value indicating what portion of a dump to be returned by the call. Sequence of 0 returns the beginning of the Dump. The value in all subsequent call as needed, should be set to the value of the Next_Sequence returned from each previous call. Sequence_Lo Least-significant 32 bits of the Sequence Buffer Address of dump buffer (NULL indicates completion of processing) Length Length of the buffer in bytes (min. 1024) Out Status -1: Hardware error -2: Busy, try again later -9002: Not Authorized 0: Dump complete 1: Continue dump 990x: Extended Delay where x is a number 0-5 Next_Sequence_Hi Most-significant 32 bits of the Next_Sequence value indicating the portion of the dump to be retrieved on the next call if needed. (If Status is returned as 0, then the dump is complete and there is no next call required. The value of Next_Sequence in this case is undefined.) Next_Sequence_Lo Least-significant 32 bits of the Next_Sequence value Bytes_Returned_Hi Most-significant 32 bits of the Bytes_Returned value indicating the number of valid bytes returned in the Buffer Bytes_Returned_Lo Least-significant 32 bits of the Bytes_Returned value indicating the number of valid bytes returned in the Buffer

Software Implementation Note: When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling ibm,platform-dump again. However, software may issue the ibm,platform-dump call again either earlier or later than this. R1--2. For the Platform Dump option: On the first call to ibm,platform-dump of a platform dump sequence for a given Dump_Tag, the Sequence value must be initialized to zero and on subsequent calls for the same tag, the Dump_Sequence must be set to Next_Sequence of the previous call made with the same Dump_Tag or else set to zero to restart the entire dump sequence. R1--3. For the Platform Dump option: The dump tag passed to any call to ibm,platform-dump must be a value specified by the platform and communicated to the OS by an event-scan error log entry. R1--4. For the Platform Dump option: Once a Status of 0 (Dump complete) or -1 (Hardware error” is returned for the ibm,platform-dump call with a particular dump tag, the dump is considered complete from a platform standpoint, but for the “Dump complete” case the OS must signal to the platform that the processing of the dump has been completed by a final call for the Dump_Tag with the Buffer address set to NULL. R1--5. For the Platform Dump option: If at any time a partition receives a -9002, Not Authorized, return code for an ibm,platform-dump RTAS, the partition must cease attempting to acquire the dump information it was in process of acquiring and discard any portion already acquired. Programming Note: It is expected that a platform generally only transmits a dump to a single partition. However, the above requirement makes provision for the platform abandoning the transmission of a dump to a partition after it has been initiated, presumably to re-initiate transmission to a different partition or to a Hardware Management Console (HMC). R1--6. For the Platform Dump option: The contents of dump information returned through the sequence of calls to ibm,platform-dump, must follow a dump directory structure as defined in . R1--7. For the Platform Dump option: Collectively the dump data returned from a sequence of ibm,platform-dump calls for a given Dump_tag must consist of one dump file directory entry as described in followed by one or more dump section directory entries as described in followed by a dump data section for each dump section directory entry earlier included. Programming Notes: As required in , the OS can determine the maximum size of a copy of each dump that can be returned by issuing an ibm,get-system-parameter for the platform-dump-max-size. In addition, in the case of any change in the value of this parameter, the platform may generate a Platform Event Log entry announcing the change in the maximum size, and specifying the new size in the IO Events Section. This entry, when generated, is then returned by the event-scan RTAS call. The Dump_Tag is taken from the Dump Locator Section of the Platform Error/Event Log Format, Version 6 or later. Specifically, Dump_Tag_Hi is composed of the 8 bit Dump Type as found in the Dump Locator Section, padded with 24 bits on the left to make a 32 bit quantity. The Dump_Tag_Lo is the Dump ID found in the Dump Locator Section of the Error log entry. If the ibm,platform-dump RTAS routine returns with the Status of 1 (Continue dump), the transfer is proceeding but had to be suspended to maintain the short execution time requirement of RTAS routines or because more data was available than the Buffer could contain. The Bytes_Returned value indicates how many bytes of dump data (if any) were returned on a call and OS must be prepared to handle the case of no bytes returned. When Continue dump Status (1) is returned, this indicates that there is more dump data available then was returned in the buffer. A subsequent call with the same Dump_tag and the Sequence value being set to the Next_Sequence returned from the previous call returns additional dump data. When a dump has been successfully transmitted, the Status of 0 (Dump complete) is returned. If there is a hardware error preventing a dump from being successfully transmitted, as Status of -1 (Hardware error) is returned. In either case, the Dump sequence is completed. It should be noted that the final Next_Sequence value returned is undefined. After the sequence is completed, the OS should make one final call for the given Dump_Tag using a NULL buffer pointer. (The value of the Sequence parameter for this call is undefined although it is acceptable for the platform to make the value equal to the last Next_Sequence value returned.) This call tells the platform that the OS has completed processing of the dump and will not attempt to restart the sequence. If the platform used system memory to hold dump data, the platform at this point is permitted to free the associated logical memory blocks (LMBs) reserved for the dump. Successful return from the ibm,platform-dump RTAS call with a NULL buffer pointer indicates to the OS that one or more logical memory blocks (LMBs) may now be acquired by the OS. A get-sensor-state RTAS call for these LMBs returns with a state of “DR entity available for recovery (4)” after the successful return from this ibm,platform-dump RTAS call. If a platform does not receive the NULL buffer pointer call dump for a given Dump_Tag but subsequently boots the partition, the platform may report the presence of the dump again on an e vent-scan after the boot.

Platform Dump Directory Structure The entire dump contents returned over a sequence of ibm,platform-dump RTAS calls for a given Dump_Tag follows a directory/data structure as illustrated in and where a dump consists of one File Directory Entry, one or more Section Directory Entries and one data section for each Section Directory entry. Platform Dump File Directory Entry Format Field Name Length Values Discussion Entry Header 8 Bytes “FILE” Identifies the type of entry that follows. The value is ASCII consisting of the characters “FILE” and 4 ASCII blanks. Entry Length 2 Bytes Number of bytes of the entire file directory entry This length includes the Entry Header and Entry Length fields. Reserved 6 Bytes Flags 4 Bytes See . Entry Type 2 Bytes 0x0001 0x0001signifies a file entry. Prefix Length 2 Bytes Number of bytes of the Dump File Base Name that is considered to be a prefix. Dump File Base Name Length in bytes computed as “Entry length” - 24, but not to exceed 46 characters including the ASCII NULL string termination. NULL terminated ASCII String consisting of ASCII characters in the ranges of a-z, A-Z, 0-9, and the ASCII “.” Gives a base name for the dump file to be created from the dump data. This base name is composed of a prefix followed by additional data (e.g. dumptype.serialnumber.dumpID.timestamp where dumptype.serialnumber is the prefix)

Dump Section Directory Entry Format Field Name Length Values Discussion Entry Header 8 Bytes “SECTION” Identifies the type of entry that follows. The value is ASCII consisting of the characters “SECTION” and 1 ASCII blank. Entry Length 2 Bytes Number of bytes of the entire section directory entry This length includes the Entry Header and Entry Length fields. Priority 2 Bytes Unsigned integer See programming note after . Reserved 4 Bytes Flags 4 Bytes See . Entry Type 2 Bytes 0x0002 0x0002 signifies a section entry. Reserved 2 Bytes Section Length 8 Bytes Length in bytes of the section of the dump that this entry is the directory for Section Name Length in bytes computed as “Entry length” - 32, but not to exceed 46 characters including the ASCII NULL string termination. NULL terminated ASCII String. Gives a name to the dump section for which this entry is a directory.

The two previous tables refer to a set of flags used to describe information related to a dump section. The options are stored in a single 32 bit value which is the bit-wise OR'ing of each option value defined in . Dump File Format Directory Options Name Bit Position(s) of Option Definition Discussion last_flag 0x00000001 Binary value set to 1 if the last directory entry. Flag is never set for the File Directory entry since at least one Section Directory entry follows. not_transmitted 0x00000002 If set to 1, indicates that the data for the block has not been transmitted during some process of dump transfer. Platform always sets this value to 0. The bit may be set to 1 by applications transmitting a dump. See Software implementation note item 2: in this section below. Reserved All but bit positions shown above All other values reserved

Software Implementation Notes: Platforms supporting the ibm,platform-dump call may have several unique dump types. All dumps of the same type on a partition have the same “prefix” to the name of the dump file as indicated in the dump file directory entry in an error log. The priority in the priority field of the section directory entries allow an application transmitting a dump to a remote support center to decide what sections of data to transmit when the connection bandwidth is limited. Zero is the highest priority. All sections at the same priority shall be transmitted if any at that priority are transmitted. It is intended that all directory entries be transmitted with the section length set to zero and their not_transmitted Dump File Format Directory Options flag set to a 1 if the section data cannot be transmitted.

PCI Configuration Space Device drivers and system software need access to PCI configuration space. section on "Address Map" defines system address spaces for PCI memory and PCI I/O spaces. It does not define an address space for PCI configuration. Different PCI bridges may implement the mechanisms for accessing PCI configuration space in different ways. The RTAS calls in this section provide an abstract way of reading and writing PCI configuration spaces. The PCI access functions take a config_addr input parameter which is similar to the Type 1 PCI configuration space address. For conventional PCI and PCI-X Mode 1, this address is a 24-bit quantity composed of bus, device, function, and register numbers. This allows the configuration of up to 256 buses (including sub-bridges), 32 IOAs per bus, 8 functions per IOA, and 256 bytes of register space per function. PCI-X Mode 2 and PCI Express define an extended configuration space with an additional 4-bit quantity which specifies an extended register number allowing for 4096 bytes of register space per function. Refer to the or the for more details. The config_addr for an IOA is derived from the OF device tree, and is defined in . The ibm,read-pci-config and ibm,write-pci-config RTAS calls allow for the specification of the PHB Bus Unit ID, and therefore allow for up to 256 unique config_addr bus numbers per PHB. Note that for each pci connector, there may be multiple PCI bus numbers, because plug-in PCI cards may contain PCI to PCI bridges, which create other PCI buses. The PCI Local Bus Specification requires that unimplemented or reserved register space read as 0’s, and that reads of the Vendor ID register of IOAs or functions which aren’t present should be unambiguously reported (reading 0xFFFF is sufficient). Writes to unimplemented or reserved register space are specified as no-ops. Writes to IOAs or functions which aren’t present are undefined. These operations are undefined if a bus is specified which doesn’t exist. R1--1. For the RTAS PCI configuration space and EEH functions where the parameter config_addr is requested as input, the config_addr parameter must be as specified by the hi cell of the physical address in Open Firmware Working Group proposal number 516 Ver 1.8 (see ), with the upper register address bits added for PCI-X Mode 2 and PCI Express, in order to access past the first 256 bytes of configuration space. Definition <emphasis>Config_addr</emphasis> Bit Definition 0:3 Upper bits of the Register Number, when applicable, otherwise 0. Set to 0 when the PCI extended configuration space is not available, due to lack of support somewhere from the PHB to the IOA. When a value of this field can be something other than 0, the “ibm,pci-config-space-type” property will exist in the IOA's node with a value indicating that the extended space is supported. 4:7 Reserved (set to 0) 8:15 Bus Number 16:20 Device Number 21:23 Function Number, when applicable, otherwise 0 24:31 Lower bits of the Register Number, when applicable, otherwise 0

R1--2. All RTAS PCI Read/Write functions must follow the appropriate PCI specification. R1--3. RTAS must follow the rules of when accessing PCI configuration space. Software Implementation Notes: Since PCI Configuration space is defined to be Little-Endian, RTAS accesses this area using the byte-reversed forms of the Load and Store instructions. In this fashion, the values passed are defined Big-Endian. Prior to accessing the extended configuration address space of PCI-X Mode 2 and PCI Express devices, an IOA device driver is responsible for checking if the “ibm,pci-config-space-type” property (see ) of the IOA's node exists and is set to a non-zero value.

<emphasis>ibm,read-pci-config</emphasis> R1--1. For Platforms which may have greater than 256 PCI Buses: RTAS must implement an ibm,read-pci-config call using the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,read-pci-config</emphasis> Parameter Type Name Values In Token Token for ibm,read-pci-config Number Inputs 4 Number Outputs 2 Config_addr Configuration Space Address PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Size Size of Configuration Cycle in bytes, value can be 1, 2, or 4 Out Status 0: Success -1: Hardware Error -3: Parameter Error Value Value Read from the location specified by the PHB Unit ID and config_addr

R1--2. The ibm,read-pci-config call must return the value from the configuration register which is at the location specified by the PHB Unit ID and config_addr in PCI configuration space. R1--3. The ibm,read-pci-config call must perform a 1-byte, 2-byte, or 4-byte configuration space read depending on the value of the size input argument. R1--4. The config_addr must be aligned to a 2-byte boundary if size is 2 and to a 4-byte boundary if size is 4. R1--5. The ibm,read-pci-config call of IOAs or functions which are not present or which are not available to the caller must return Success with all ones as the output value.

<emphasis>ibm,write-pci-config</emphasis> R1--1. For Platforms which may have greater than 256 PCI Buses: RTAS must implement an ibm,write-pci-config call using the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,write-pci-config</emphasis> Parameter Type Name Values In Token Token for ibm,write-pci-config Number Inputs 5 Number Outputs 1 Config_addr Configuration Space Address PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Size Size of Configuration Cycle in bytes, can be 1, 2, or 4 Value Value to be written to the location specified by the PHB Unit ID and config_addr Out Status 0: Success -1: Hardware Error -3: Parameter or device enablement error

R1--2. The ibm,write-pci-config call must store the value to the configuration register which is at the location specified by the PHB Unit ID and config_addr in PCI configuration space. R1--3. The ibm,write-pci-config call must perform a 1-byte, 2-byte, or 4-byte configuration space write depending on the value of the size input argument. R1--4. The config_addr must be aligned to a 2-byte boundary if size is 2 and to a 4-byte boundary if size is 4. R1--5. The ibm,write-pci-config call of IOAs or functions which are not present or which are not available to the caller must be ignored and a Status of 0 (Success) must be returned. R1--6. For the LPAR option: The Status of -3 (Parameter or device enablement error) must be returned if all the following are true: The OS attempts an ibm,write-pci-config to enable Memory or I/O for an IOA, without first calling ibm,set-eeh-option to enable EEH for the IOA Enabling the IOA could expose other partitions to errors from the partition which is enabling the IOA The hypervisor is enforcing EEH mode Platform Implementation Note: In Requirement , cross-partition errors could be caused due to error domains which are shared between the partitions. However, it is acceptable to share error domains when the IOA and its device driver and the partition's OS cannot (through error or maliciously) cause errors which affect another partition.

Operator Interfaces and Platform Control The RTAS operator interface and platform control functions provide the OS with the ability to perform platform services in a portable manner. The RTAS operator interface provides the ability for the OS to notify the user about OS events during boot, to notify the user of abnormal events, and to obtain information from the platform. The platform control functions give the OS the ability to obtain platform-specific information and to control platform features. These calls are all “best effort” calls. RTAS should make its best effort to implement the intent of the call. If the Platform Hardware does not implement some optional feature, it is permitted for RTAS to either return an error, or to virtualize the service in some way and return “Operation Succeeded.” Software Implementation Notes: For example, a keyswitch could be virtualized by storing a keyswitch value in NVRAM and by providing a user interface to modify this value. The RTAS call get-sensor-state on the keyswitch returns the value stored in NVRAM. If these services are only called prior to the use of any of the underlying devices by the OS, for example, during boot time, or only after the OS has finished using the devices, for example, during a crash, then the OS can avoid mutual exclusion and sharing concerns. Otherwise, synchronization per , must be performed.

Op Panel Display R1--1. Platform Implementation: All servers must implement an operator panel display mechanism by supporting the display-character RTAS call. Implementation Note: The operator display mechanism in Requirement may be a physical alphanumeric display with a special purpose LCD device marked “used by RTAS”, or it may be some other virtualized display which is accessible through some method not defined by this architecture. R1--2. Platform Implementation: Servers which provide display-character must provide a line length of at least 16 characters. Software Implementation Notes: There are currently four uses for the op panel display. The first is for display of an error code, if needed, from the Built-In-Self-Test (BIST) or Power-On-Self-Test (POST). This display is machine dependent. (These tests are executed prior to loading the OS or the operation of OF. Any display requirements are handled within the hardware.) The second is for progress indication during initialization and boot. This display is four digits and is updated as boot proceeds. The third is for display after a failure running diagnostics. In this case, a service request number (SRN) is displayed along with a FRU location code list of possible devices needing service. These numbers and locations can be longer than four characters. The SRN may be over 12 characters and a FRU location code list is one or more items, typically three, of 2 to 32 characters. The fourth is a crash code from the OS which is 12 characters indicating cause and dump status. The RTAS set-indicator call with token #6 specifies 4 hex digits. The display-character call requires a minimum display size of one line of 4 characters, but a larger display may be made known to the OS using the “ibm,” extension properties defined in . When the message to be displayed is larger than the OS believes the display to be, the OS should perform appropriate truncation, scrolling, or otherwise meaningfully display the message using the platform’s display resource. Some servers implement a display larger than the default. For these servers, the “ibm,display-line-length” property and the “ibm,display-number-of-lines” property are set appropriately. If the OS assumes the default display, the 2X16 display still works. It appears to be working in the bottom line and scrolling through the top line as long as only CR and LF are issued for control. The OF device tree properties indicate what is supported.

Service Processor A service processor is not a platform requirement. Larger servers tend to be implemented with service processors. When implemented, the service processor is not seen directly as a device by software. All of its services which are visible to the OS, are abstracted with RTAS. The service processor may support the operator panel, manage sensors and indicators, run diagnostics, monitor the platform environment and save error logs. There is clearly an interface between RTAS and the service processor, but that interface is not intended to be used by the OS. The service processor, in those platforms which choose to use one, is key in the initialization of the platform and has interfaces with the OF code. It is also involved with VPD collection and NVRAM access during initialization. It can also provide a serial port for a remote service capability. The service processor is also a significantly slower processor than the primary PA processor. Therefore, in the implementation of RTAS functions which use the service processor, care should be taken to avoid interlocks with the service processor which could significantly impair performance.

Surveillance Platforms which include a service processor have the needed mechanism for a surveillance function; that is, the OS and the service processor can monitor each other. For example, if the OS crashes or hangs, or if the service processor has failures, a failure notification could occur. Notification could also occur if the platform fails during the boot process, or if it cannot complete a boot successfully. The notification can be sent to a service center or to a customer administrator, as determined by the customer setup of configuration parameters. The firmware provides notification to the OS by reporting exceptions through event-scan. The service processor can provide dial-out notification if the OS stops, or if a boot process fails. In the implementation of surveillance, the service processor monitors the OS by tracking the issuance of heartbeats generated by calls to the event-scan RTAS service. If a service processor time-out occurs prior to receiving another heartbeat, an action based on user defined call out policy occurs. This action could be to reboot, call service or power-down. The policy may be different depending on whether the time-out occurs during a boot process or during a period of normal OS operation. The default policy and time-out period, kept in NVRAM, can be changed from a service processor menu or from software. The platform can be configured such that surveillance is either enabled or disabled immediately after boot. After boot, temporary changes to the surveillance state can be made by issuing a set-indicator call to indicator 9000 (see ). The following system parameters define the default behavior of surveillance mode (see also, for more information about these parameters and for their default values). The sp-sen system parameter defines whether the default state of surveillance by the service processor is enabled (=on) or disabled (=off). The sp-sti system parameter defines the period of time (1-255 minutes) that the service processor should wait between heartbeats from event-scan. If the time-out period expires without the service processor receiving another heartbeat, the service processor initiates recovery and reporting actions as defined by the user. The sp-sdel system parameter defines the period of time (1-120 minutes) that the service processor should wait before starting surveillance after control passes to the OS. This value is set to allow enough time for the OS to boot and initialize to the point where it can start calling event-scan on a regular periodic basis. Architecture Note: Surveillance times out if the time of the parameter, sp-sdel, plus the time of the parameter, sp-sti, passes prior to receiving the first heartbeat. In effect, the first event-scan can be considered the signal for boot complete. The platform may perform surveillance on the service processor using event-scan to trigger checking as well as for reporting any errors found. Software Implementation Note: The surveillance here is for keeping an eye on the overall functioning of the OS. If a specific process gets hung and the OS is still functioning, it is the responsibility of the OS to detect and not the surveillance discussed here. OF Implementation Note: The OS is expected to call the event-scan RTAS service (with the internal-errors mask bit on) at the rate defined by the property “rtas-event-scan-rate” in the OF device tree. If an “rtas-event-scan-rate” of zero (0) is placed in the OF device tree and surveillance is initialized as ‘active’, a surveillance time-out occurs after the time-out period since the heartbeats are triggered by the event-scan call. If there is reason to operate with the rate = 0, the default state of surveillance ( sp-sen parameter in NVRAM) should be disabled, and the surveillance sensor and indicator should not be placed in the OF device tree. R1--1. Platform Implementation: The default surveillance policy must be defined by the sp-sen, sp-sti and sp-sdel system parameters, as set by the service processor or by software. R1--2. Platform Implementation: Heartbeats to the service processor must only be sent as the result of a call to the event-scan RTAS service with the internal-errors bit (bit 0) set to 1 in the call buffer Event Mask parameter. R1--3. Platform Implementation: In platforms which implement surveillance, the event-scan RTAS service may be called more than once per minute, but the heartbeat to the service processor must be sent at the rate of at least once per minute. R1--4. Platform Implementation: In platforms which implement surveillance, the ibm,os-term RTAS call must be implemented. Software Note: Requirement provides a mechanism for the OS to release control of the platform without being aware of the state of surveillance. With the definition of a default platform state for surveillance, the OS may not be aware of the function, yet surveillance may be used. Platforms may not have a dependency on the OS to turn off surveillance during normal shutdown (a shutdown not including immediate reboot).

Surveillance on SMP Systems Each running processor in an SMP system should be covered by surveillance. The following requirements assure this coverage. R1--1. Each processor which is running, that is, not stopped by the stop-self RTAS call or not stopped due to BIST testing at bring-up, must issue the event-scan RTAS call. The rate of issue is the “rtas-event-scan-rate” times per minute divided by the number of processors. This is the minimum rate. R1--2. The system must allow for all processors to cycle through their event-scan calls. The timeout period for a surveillance event, which is sp-sti, must be greater than n time t, where n is the number of processors and t is the “rtas-event-scan-rate”. R1--3. The surveillance event must be signaled if after the surveillance interval, sp-sti, one or more processors has not issued an event-scan call. Implementation Note: Care is required in the assignment of the surveillance interval and the “rtas-event-scan-rate” such that a surveillance event is not signaled prematurely. The default values are not meant for a system with a large number of processors.

<emphasis>display-character</emphasis> The display-character function allows the display of both alphabetic and numeric information. The display for this function requires at least one line of four (4) characters. Also specified are the control characters carriage-return (CR) (0x0D) and line-feed (LF) (0x0A). The following OF properties are defined in : “ibm,display-line-length” “ibm,display-number-of-lines” “ibm,display-truncation-length” “ibm,form-feed” R1--1. If display-character is implemented on a platform, the property “ibm,display-line-length” in the /rtas node must be provided if greater than the required minimum default of 4 characters. R1--2. If display-character is implemented on a platform, the property “ibm,display-number-of-lines” in the /rtas node must be provided if greater than the required minimum default of 1 line. R1--3. If the “ibm,display-number-of-lines” is greater than one, the platform must support form-feed (FF) (0x0C). R1--4. If form-feed is implemented, it must clear the display and position the display pointer to line 1 column 1. R1--5. The platform must include the property “ibm,form-feed” in the /rtas node. R1--6. For the display-character RTAS call, when the truncation length as specified in the “ibm,display-truncation-length” property, when it exists, is less than the length of the line being displayed on that particular line, then the firmware must truncate the requested line to be displayed to the length specified in the “ibm,display-truncation-length” property for that line. R1--7. For the display-character RTAS call, when the truncation length as specified in the “ibm,display-truncation-length” property, when it exists, is greater than the length specified of the line as specified in “ibm,display-line-length” then the platform must provide a platform-dependent method of displaying the line to the user. R1--8. For platforms that use converged location codes, the platform must provide scrolling for the display-character RTAS call, on the second line of the display, and must provide the “ibm,display-truncation-length” property and specify a truncation length of no less than 80 characters for that line. Platform and Software Implementation Note: In implementing Requirements and , it is permissible to have a separate buffer for any of the lines of the display and not display that line until a button is pressed. The RTAS call display-character can be used by the OS to display informative messages during boot, or to display error messages when an error has occurred and the OS cannot depend on its display drivers. This call is intended to display the alpha-numeric characters on an LCD panel, graphics console, or attached tty. The precise implementation is platform vendor specific. R1--9. RTAS must implement a display-character call using the argument call buffer defined by to place a character on the output device. R1--10. The OS must serialize all calls to display-character with any other use of the rtas-display-device. Argument Call Buffer <emphasis>display-character</emphasis> Parameter Type Name Values In Token Token for display-character Number Inputs 1 Number Outputs 1 Value Character to be displayed Out Status 0: Success -1: Hardware error -2: Device busy, try again later

R1--11. If a physical output device is used for the output of the RTAS display-character call, then it must have at least one line and 4 characters. R1--12. Certain ASCII control characters must have their normal meanings with respect to position on output devices which are capable of cursor positioning. In particular, ^M (0x0D) must position the cursor at column 0 in the current line, and ^J (0x0A) must move the cursor to the next line. If on the bottom line, move to column 0 and scroll old data off the top. R1--13. The ASCII characters which must be displayed are generally those coded from 0x20 to 0x7E as shown in . SP indicates a space and ND is not defined Display ASCII Characters Hex Disp Hex Disp Hex Disp Hex Disp Hex Disp Hex Disp 20 SP 30 0 40 @ 50 P 60 ‘ 70 p 21 ! 31 1 41 A 51 Q 61 a 71 q 22 “ 32 2 42 B 52 R 62 b 72 r 23 # 33 3 43 C 53 S 63 c 73 s 24 $ 34 4 44 D 54 T 64 d 74 t 25 % 35 5 45 E 55 U 65 e 75 u 26 & 36 6 46 F 56 V 66 f 76 v 27 ‘ 37 7 47 G 57 W 67 g 77 w 28 ( 38 8 48 H 58 X 68 h 78 x 29 ) 39 9 49 I 59 Y 69 i 79 y 2A * 3A : 4A J 5A Z 6A j 7A z 2B + 3B ; 4B K 5B [ 6B k 7B { 2C , 3C < 4C L 5C \ 6C l 7C | 2D - 3D = 4D M 5D ] 6D m 7D } 2E . 3E > 4E N 5E ^ 6E n 7E ~ 2F / 3F ? 4F O 5F _ 6F o 7F ND

Software Implementation Note: Care should be taken in using the full character set for all systems as some characters may not be available or may display in a different fashion. For instance, the currency symbol, $ (0x24), may be modified to a national currency symbol. Other currently known differences occur for the reverse slant, \ (0x5C), and the tilde, ~(0x7E). R1--14. RTAS must not output characters to the rtas-display-device except for explicit calls from the OS to the display-character function except for the following conditions. The rtas-display-device is marked “used-by-rtas”. The RTAS call is power-off, ibm,power-off-ups, set-power-level (0,0), or system-reboot. Software Implementation Notes: RTAS should try to produce output to the user. This could be to the system console, to an attached terminal, or to some other device. It could be implemented using a diagnostic processor or network. RTAS could also implement this call by storing the messages in a buffer in NVRAM so the user could determine the reason for a crash upon reboot. This call modifies the registers associated with the rtas-display-device. The OS may also access this device, being aware that calls to display-character change the state of the device.

<emphasis>set-indicator</emphasis> The RTAS set-indicator function provides the OS with an abstraction for controlling various lights, indicators, and other resources on a platform. If multiple indicators of a given type are provided by the platform, this function permits addressing them individually. R1--1. RTAS must implement a set-indicator call which sets the value of the indicator of type Indicator and index Indicator-index using the argument call buffer defined by and indicator types defined by . Argument Call Buffer <emphasis>set-indicator</emphasis> Parameter Type Name Values In Token Token for set-indicator Number Inputs 3 Number Outputs 1 Indicator Token defining the type of indicator Indicator-index Index of specific indicator (0, 1,...) State Desired new state Out Status 990x: Extended Delay 0: Success -1: Hardware Error -2: Hardware busy, try again later -3: No such indicator implemented -9000: Multi-level isolation error -9001: Valid outstanding translation

R1--2. For indicators in the “rtas-indicators” property, the indices for indicators must start at zero (0) and increment sequentially up to the maximum index; that is, all of the integers and only those integers from 0 to the maximum index are valid. Architecture Note: Indicator indices that are obtained via the ibm,get-indices RTAS call are not necessarily contiguous (that is, any of the indices between 0 and the maxindex, inclusive, may be missing). R1--3. Of the indicator types defined by , RTAS must implement at least Tone Frequency and Tone Volume. R1--4. The set-indicator RTAS call must not return a busy indication (-2 or 990x) for any indicator in which is marked with a “yes” in the “Fast?” column of that table. R1--5. The platform may, but is not required to, turn off a tone automatically after 5 minutes or more duration (that is, automatically set the Tone Volume to zero), and therefore a user of the Tone must call set-indicator Tone Volume with a volume value of non-zero, if a tone is to be sustained longer than 5 minutes, and if the platform is going to automatically terminate the tone, the platform must reset its automatic turn-off timer when it receives a set-indicator call for the Tone Volume with a non-zero tone volume value. Defined Indicators Indicator Name Token Value Defined Values Default Value Fast? Required? <vendor> Values in the “<vendor>” column are used to replace the “ <vendor>” field of the “<vendor>,indicator-<token>” property, when that property is presented. See Requirement . Examples/Comments Tone Frequency 1 Unsigned Integer (units are Hz) 1000 yes When tone is required. See Requirement . ibm Generate an audible tone using the tone generator hardware. RTAS selects the closest implemented audible frequency to the requested value. Tone Volume 2 0-100 (units are percent), 0 = OFF 0 yes When tone is required. See Requirement . ibm Set the percentage of full volume of the tone generator output, scaled approximately logarithmically. RTAS should select the closest implemented volume for values between zero (off) and 100 (full on). - 3-6 - - - - - Reserved. - 7 - - - - - Reserved. Was (deprecated) Battery Warning Time. - 8 - - - - - Reserved. Was (deprecated) Condition Cycle Request. Surveillance 9000 0-disabled 1-255-timeout sp-sti yes When the platform implements the surveillance function. ibm Initialized with value from the sp-sti system parameter. Isolation-state 9001 Isolate = 0 Unisolate = 1 1 no For all DR options - Isolate refers to the DR action to logically disconnect from the platform and/or OS (for example, for PCI, isolate from the bus and from the OS). See for more details. DR 9002 Inactive = 0 Active = 1 Identify = 2 Action = 3 0 if Inactive 1 if Active no For all DR options - Indicator index may refer to a single indicator that combines Power/Active indicator and Identify/Action indications or just an Identify/Action indicator. Identify and Action may map to the same visual state (for example, the same blink rate). See and for more information. Allocation-state 9003 unusable (0) usable (1) exchange (2) recover (3) no For all DR options - Allows an OS image to assign (usable, exchange, or recover) resources from the firmware or, release resources from the OS to the firmware. See for more details. - 9004 - - - - - Reserved. Global Interrupt Queue Control 9005 Disable = 0 Enable = 1 1 yes See Requirement . ibm Enable and Disable the processor as Global Interrupt Queue Server Error Log or FRU Fault 9006 Normal (off) = 0 Fault (on) = 1 0 no Yes See . ibm This indicator is combined with the Identify indicator for the Primary Enclosure drawer/enclosure (that is, is the same physical indicator). Off indicates that the system is working normally. On indicates that the system hardware, firmware and/or diagnostics detected a fault (failure) in the system or a partition requires operator intervention for another reason. The Error Log indicator is located only on the Primary Enclosure. See and for more information. Identify (Locate) 9007 Normal (off) = 0 Identify (blink) = 1 0 no Yes See . ibm Note that a 9002 indicator also has an Identify state, and in the case where the 9002 indicator is implemented with two physical indicators (one for Power and one for Identify/Action), the same physical indicator must be used for both a 9002 Identify/Action indicator and 9007 Identify indicator. This architecture does not specify any mechanism for protecting against the simultaneous use by the user of an indicator that is both a 9002 and 9007 indicator, nor does it protect against the use of multiple 9007 indicators simultaneously or multiple uses of the same 9007 indicator simultaneously. See and for more information. - 9008 - - - - - Reserved. - 9009 - - - - - Reserved. Vendor Specific 9100-9999 <vendor> The vendor specific company representation, as used on other OF properties specified by that vendor. Indicator values reserved for platform vendor use.

Indicators

Indicator 9000 Surveillance An indicator is defined with the token value 9000 to allow temporary modification of the state of the surveillance function (further described in ). To enable monitoring of heartbeats from the event-scan RTAS call, the surveillance indicator is set with a value of 1 to 255, indicating the number of minutes for the surveillance time-out value. If monitoring is already enabled, the time-out value can be modified by setting this indicator. To disable monitoring, the surveillance indicator should be set to a value of zero (0). The set-indicator call is used to modify the state of surveillance (overriding the default system parameter values) only for the current session. The surveillance state returns to the default values when the system is rebooted. The default surveillance configuration may be modified by changing the system parameters. For more information on these parameters, refer to . R1--1. Platforms with the surveillance function must implement a sensor and an indicator, with the token value of 9000, with defined state input values of on (= 1-255, which enables surveillance with specified time-out value in minutes) and off (= 0, which disables surveillance). Firmware Implementation Note: The requirement above results in the creation of the properties “ibm,indicator-9000” and “ibm,sensor-9000” in the /rtas node. Hardware Implementation Note: The action that the service processor takes in the case of a timeout is determined by the configuration setup policy in the system parameters.

Indicator 9005 Global Interrupt Queue Control The 9005 indicator controls the global interrupt server queue logic of the interrupt presentation controllers for the processor making the call (Available Processor Mask (APM) for the PowerPC interrupt presentation controller). This is used when bringing a processor online and taking a processor offline. R1--1. Platforms that allow processors to be brought online or be taken offline dynamically must implement the global interrupt queue control indicator with a value of 9005 as specified in . R1--2. The index value for global interrupt queue control indicator (9005) must be (2 ibm,interruptserver#-size) - 1 - the gserver# of the global server to be controlled as given in the “ibm,ppc-interrupt-gserver#s” property.

<emphasis>get-sensor-state</emphasis> The RTAS call get-sensor-state is used by the OS to read the current state of various sensors on any Platform. If multiple sensors of a given type are provided by the platform, this function permits addressing them individually. R1--1. RTAS must implement a get-sensor-state call which reads the value of the sensor of type Sensor which has index Sensor-index using the argument call buffer defined by and the sensor types defined by . R1--2. If a platform tests sensor values against limits, then RTAS must return the result of these tests using the Status output parameter. <emphasis>get-sensor-state</emphasis> Argument Call Buffer Parameter Type Name Values In Token Token for get-sensor-state Number Inputs 2 Number Outputs 2 Sensor Token defining the sensor type Sensor-index Index of specific sensor (0, 1,...) Out Status 990x: Extended Delay where x is a number 0-5 (see text below) 13: Sensor value >= Critical high 12: Sensor value >= Warning high 11: Sensor value normal 10: Sensor value <= Warning low 9: Sensor value <= Critical low 0: Success -1: Hardware Error -2: Hardware Busy, Try again later -3: No such sensor implemented -9000: DR Entity isolated ( ) State Current value as defined in the Defined Values column of .

Software Implementation Note: When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling get-sensor-state again. However, software may issue the get-sensor-state call again either earlier or later than this. R1--3. For sensors in the “rtas-sensors” property, the indices for sensors must start at zero (0) and increment sequentially up to the maximum index; that is, all of the integers and only those integers from 0 to the maximum index are valid. Architecture Note: Sensor indices that are obtained via the ibm,get-indices RTAS call are not necessarily contiguous (that is, any of the indices between 0 and the maxindex, inclusive, may be missing). R1--4. The get-sensor RTAS call must not return a busy indication (-2 or 990x) for any indicator in which is marked with a “yes” in the “Fast?” column of that table. Hardware Implementation Note: Some platforms may compare the value of environmental sensors (such as the Battery Voltage or Thermal Sensor) to some limits. When the value of the sensor meets or exceeds a limit, the platform may take some action. RTAS makes the OS aware of the relationship of the sensor values to the limit by using the Status code to return this information. Software and Hardware Implementation Notes: The meaning of these limits is as follows: Critical High - The sensor value is greater than or equal to this limit. The platform may take some action and may initiate an EPOW (see ). The OS may take some action to correct this situation or to perform an orderly shutdown. Warning High - The sensor value is greater than or equal to this limit, but less than the critical high limit. The platform may initiate a warning EPOW. The OS may take some action to bring this reading back into the normal range. Normal - RTAS is aware of the limits and the value is within these operating limits. Warning Low - The sensor value is less than or equal to this limit, but greater than the critical low limit. The platform may initiate a warning EPOW. The OS may take some action to bring this reading back into the normal range. Critical Low - The sensor value is less than or equal to this limit. The platform may take some action and may initiate an EPOW. The OS may take some action to correct this situation or to perform an orderly shutdown. Where: A ‘critical’ state is defined as a condition where the sensor value of the measured item indicates that it is outside the allowable operating parameters of the system, and that a failure is imminent unless some immediate action is taken. A ‘warning’ state is defined as a condition where the sensor value of the measured item indicates that it is outside the expected operating parameters for normal operation, but has not yet reached a critical state. The variance is significant enough that either system software or an operator may want to take some action to bring the parameter back into the normal range. Platform Implementation Note: The existence of this sensor state reporting capability should not be construed as a requirement to have any limits on sensors or to always have all four limits. Defined Sensors Sensor Name Token Value Defined Values Fast? Required? <vendor> Values in the “<vendor>” column are used to replace the “ <vendor>” field of the “<vendor>,sensor-<token>” property, when that property is presented. See Requirement . Description Key Switch 1 Off (0), Normal (1), Secure (2), Maintenance (3) yes No ibm Key switch modes are tied to OS security policy. Suggested meanings: Maintenance mode permits booting from floppy or other external, non-secure media. Normal mode permits boot from any attached device. Secure mode permits no manual choice of boot device, and may restrict available functionality which is accessed from the main operator station. Off completely disables the system. - 2 - - - - Reserved. Thermal 3 Temperature (in Degrees Celsius) no No ibm If implemented, returns the internal temperature of the specified thermal sensor. - 4 - - - - Reserved. Was (deprecated) Lid Status. - 5 - - - - Reserved. - 6 - - - - Reserved. Was (deprecated) Current battery output voltage. - 7 - - - - Reserved. Was (deprecated) Battery Capacity Remaining. - 8 - - - - Reserved. Was (deprecated) Battery Capacity Percentage. Environmental and Power State (EPOW) 9 EPOW_Reset(0) Warn_Cooling(1) Warn_Power(2) System_Shutdown(3) System_Halt(4) EPOW_Main_Enclosure(5) EPOW_Power_Off(7) yes Yes ibm RTAS assessment of the environment and power state of the platform. - 10 - - - - Reserved. Was (deprecated) Battery Condition Cycle State. - 11 - - - - Reserved. Was (deprecated) Battery Charging State. Surveillance 9000 1-255 and 0 yes When the platform implements the surveillance function ibm Current state of surveillance. Fan speed 9001 fan - rpm no No ibm Voltage 9002 voltage - mv no No ibm DR-entity-sense 9003 DR connector empty = 0 DR entity present = 1 DR entity unusable (2) DR entity available for exchange (3) DR entity available for recovery (4) no For all DR options - Used in Dynamic Reconfiguration operations to determine if connector is available and whether the user performed a particular DR operation correctly. See and . Power Supply 9004 no No ibm Sense presence and status of power supplies. Global Interrupt Queue Control 9005 Disabled = 0 Enabled = 1 yes See Requirement . ibm Global interrupt queue server control state. Error Log or FRU Fault 9006 Normal (off) = 0 Fault (on) = 1 no Yes See . ibm Off indicates that the system is working normally. On indicates that the system hardware, firmware and/or diagnostics detected a fault in the system. Identify 9007 Normal (off) = 0 Identify (blink) = 1 no Yes See . ibm Identify (locate) indicator (FRU, connector, or drawer/unit). Off is the default State. On indicates the Identify State. - 9008 - - - ibm Reserved. - 9009 - - - ibm Reserved. Vendor Specific 9100-9999 <vendor> The vendor specific company representation, as used on other OF properties specified by that vendor. Reserved for use by platform vendors.

Sensors The current state of surveillance, as described in , is queried with a call to get-sensor-state with a token value of 9000. Fan speed is queried with the token value of 9001 and an index specifying the desired fan. Similarly, voltage is sensed with a token value of 9002 and an index specifying the desired voltage source. R1--1. Platforms which implement the surveillance function must implement a single defined RTAS sensor with the token value of 9000, which returns values of on (= 1-255 minutes) and off (= 0) to show the current state of surveillance during this session. R1--2. Platforms with software visible fan speed sensors must implement them as defined RTAS sensors with the token value of 9001, which returns a sensor value in revolutions per minute (RPM). R1--3. Platforms with software visible voltage sensors must implement them as defined RTAS sensors with the token value of 9002, which returns a sensor value in millivolts. Hardware Implementation Note: The notion of a delay, due to the sensor data acquisition time, may make it desirable to cache sensor data to avoid interlocking with the service processor. Software Implementation Note: Software should not assume that sensor data returned is a real time reading.

Example Implementation of Sensors An example implementation of a platform with a service processor and four fans and four voltage sensors is represented by the paired integers ( token maxindex) in the OF device tree as shown in . Example - Contents of <emphasis role="bold"><literal>“rtas-sensors”</literal></emphasis> property token maxindex (Any sensors withStandardSensor Tokens)... (Associated maxindex values)... 9000 (surveillance) 0000 9001 (fan-speed) 0003 9002 (voltage) 0003

This requires sensors such as those shown in . Example - Sensor Definitions sensor token index value surveillance 9000 0000 0 / 1-255 fan#1 fan speed 9001 0000 fan rpm fan#2 fan speed 9001 0001 fan rpm fan#3 fan speed 9001 0002 fan rpm fan#4 fan speed 9001 0003 fan rpm voltage-level #1 9002 0000 voltage - mv voltage-level #2 9002 0001 voltage - mv voltage-level #3 9002 0002 voltage - mv voltage-level #4 9002 0003 voltage - mv

In addition, the properties “ibm,sensor-9000”, “ibm,sensor-9001” and “ibm,sensor-9002” in the /rtas node that each contain an array of strings. Each entry in the array contains the location code for the matching sensor. For example, the first entry of “ibm,sensor-9001” contains the location code for fan#1. Location codes are shown in . Of course, since it is an abstracted sensor, the entry for “ibm,sensor-9000” is NULL.

Power Supply Sensors R1--1. Platforms with multiple software visible power supply sensors must implement them as defined RTAS sensors with the token value of 9004, which returns the values defined in . Power Supply Sensor Values Value Status 0 Not present 1 Present and Not operational 2 Status unknown 3 Present and operational

For static 9004 sensors, the maxindex in the “rtas-sensors” property for the token 9004 indicates the number of power supplies supported by the platform. In this case, the property “ibm,sensor-9004” in the /rtas node contains the location code for each index. For dynamic 9004 sensors, the platform provides the information about the 9004 indicators as it would for other dynamic sensors. That is, the platform does not provide the “ibm,sensor-9004” property and instead provides the 9004 location code information through the ibm,get-indices RTAS call, and if the ibm,get-indices RTAS call returns an index of all-1's for a 9004 indicator, then the ibm,get-dynamic-sensor-state RTAS call is used to get the sensor state, instead of the get-sensor RTAS call.

Environmental Sensors R1--1. Platforms which want to allow an application to analyze their environmental sensors must provide the property “ibm,environmental-sensors” in the /rtas node (see ). The values for this property is a list of integers that are the token values (token) for the defined environmental sensors and the number of sensors (maxindex) for that token which are implemented on the platform. Architecture Note: When a sensor is in the “ibm,environmental-sensors” property and when the sensor token indices are obtained via the ibm,get-indices RTAS call, the indices may not be contiguous for that sensor token (that is, any of the indices between 0 and the maxindex, inclusive, may be missing).

Sensor 9005 Global Interrupt Queue Control State The 9005 sensor reports the state of the global interrupt server queue logic of the interrupt presentation controller for the specific processor making the call (Available Processor Mask (APM) for the PowerPC interrupt presentation controller). This is used when varying the processor on and off line. R1--1. Platforms that allow processors to be brought online or be taken offline dynamically must implement the global interrupt queue control sensor with a value of 9005 as specified in . R1--2. RThe index value for global interrupt queue control state sensor (9005) must be (2 ibm,interrupt-server#-size) - 1- the gserver# of the global queue to be sensed as given in the “ibm,ppc-interrupt-gserver#s” property. Note: on platforms that do not report “ibm,interrupt-server#-size” property, the assumed value of the size of the interrupt server number is 8.

Power Control

<emphasis>set-power-level</emphasis> This RTAS call is used to set the power level of a power domain to either on or off. R1--1. RTAS must implement the set-power-level call using the argument call buffer defined by . Argument Call Buffer <emphasis>set-power-level</emphasis> Parameter Type Name Values In Token Token for set-power-level Number Inputs 2 Number Outputs 2 Power_domain Token defining the power domain Level Token for the desired level for this domain Out Status 0: Success -1: Hardware Error -2: Busy, Try again later 990x:Extended Delay where x is a number 0-5 (see text below) Actual_level The power level actually set

R1--2. Power_domain must be a power domain identified in the OF device tree. R1--3. Level must be 100 for full power and 0 for off. R1--4. The set-power-level call must return the power level actually set in the Actual_level output parameter. Software Implementation Notes: The set-power-level(0,0) call, if implemented, removes power from the root domain, turning off power to all domains. The external events which can turn power back on are platform specific. The RTAS primitive power-off also removes power from the system, but permits specifying the events which can turn power back on. The implemented values for the Level parameter for each power domain are defined in the OF device tree. R1--5. The set-power-level RTAS call, when implemented, must return either a -2 or a 990x return code if the set-power-level operation specified in the RTAS call is going to exceed 1 millisecond in duration (where value of x gives a hint as to the duration of the busy; see text). A single set-power-level operation may require an extended period of time for execution. Following the initiation of the hardware operation to change the power level, if the set-power-level call returns prior to successful completion of the operation, the call returns either a Status code of -2 or 990x. A Status code of -2 indicates that RTAS may be capable of doing useful processing immediately. A Status code of 990x indicates that the platform requires an extended period of time, and hints at how much time is required. Neither the 990x nor the -2 Status codes implies that the platform has initiated the operation, but it is expected that the 990x Status is used only if the operation had been initiated. When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling set-power-level with the same power domain token. However, software may issue the set-power-level call again either earlier or later than this. Software Implementation Note: In Requirement , a return code of -2 or 990x may either mean that the operation was initiated but not completed, or may mean that the operation was not initiated at all. Firmware Implementation Notes: If the RTAS initiates and returns before successful completion of the operation, then it needs to handle the split of a set-power-level operation across multiple calls. It is the firmware’s responsibility to not return a Status of 0 (success) until the operation is complete, and that may require performing an operation such as a delay operation or querying the hardware for power good status. In the former case, the firmware needs to save state between the calls to the same power domain number, until the operation is complete. The set-power-level RTAS call may be called to set the power level of other power domains after the initiation to other domains and before the operation to those other domains are complete. If necessary, the set-power-level call may return a -2 or 990x Status to those calls without initiating the operation, if multiple simultaneous operations are not feasible.

<emphasis>get-power-level</emphasis> R1--1. RTAS must implement the get-power-level call using the argument call buffer defined by . Argument Call Buffer <emphasis>get-power-level</emphasis> Parameter Type Name Values In Token Token for get-power-level Number Inputs 1 Number Outputs 2 Power_domain Token defining the power domain Out Status 0: Success -1: Hardware Error -2: Busy, try again later -3: Can’t determine current level Level The current power level for this domain

R1--2. Power_domain must be a power domain identified in the OF device tree. Software Implementation Note: The get-power-level call only returns information about power levels whose state is readable in hardware. It does not need to remember the last set state and return that value.

<emphasis>power-off</emphasis> This primitive turns power off on a system which is equipped to perform a software-controlled power off function. R1--1. If software controlled power-off hardware is present, the power-off function must turn off power to the platform, using the argument call buffer described in . <emphasis>power-off</emphasis> Argument Call Buffer Parameter Type Name Values In Token Token for power-off Number Inputs 2 Number Outputs 1 Power_on_mask_hi Mask of events that can cause a power on event - event mask values [0:31] (right-justified if the cell size is 64 bits) Power_on_mask_lo Mask of events that can cause a power on event - event mask values [32:63] (right-justified if the cell size is 64 bits) Out Status On successful operation, does not return -1: Hardware error

R1--2. If software controlled power-off hardware is present, Power_on_mask, which is passed in two parts to permit a possible 64 events even on 32-bit implementations, must be a bit mask of power on triggers, or if the “power-on-triggers” property is absent from the /rtas node, a value of 0 must be used for Power_on_mask_hi and Power_on_mask_lo. R1--3. Platforms must omit the “power-on-triggers” property from the /rtas node. Implementation Note: The power on triggers, which were removed from this architecture, are documented in the , for legacy reasons. Defined Power On Triggers Bit Event 0 Power Switch On 2 Lid Open 5 Wake Button 8 Switch to Battery 9 Switch to AC 10 Keyboard or mouse activity 12 Enclosure Closed 13 Ring Indicate 14 LAN Attention 15 Time Alarm 16 Configuration change 17 Service Processor

R1--4. For the System Parameters option: If software controlled power-off hardware is present, the power-off function must prevent reboot in the event of a later external power recovery with the platform_auto_power_restart system parameter enabled.

<emphasis>ibm,power-off-ups</emphasis> This RTAS call manages the system power-off function in systems which may have power backed up with an Uninterruptible Power Supply (UPS). R1--1. For platforms that support a platform controlled Uninterruptible Power Supply (UPS), the ibm,power-off-ups function must be implemented, whether a platform controlled UPS is present or not, using the argument call buffer described in . Argument Call Buffer <emphasis>ibm,power-off-ups</emphasis> Parameter Type Name Values In Token Token for ibm,power-off-ups Number Inputs 0 Number Outputs 1 Out Status On successful operation, does not return -1: Hardware error

R1--2. If a platform controlled UPS is present, then the ibm,power-off-ups RTAS call must turn off system power while enabling platform auto restart upon restoration of system power, according to the platform_auto_power_restart policy described in , and must not return, otherwise, the call must not turn off system power and must not return. R1--3. If a platform controlled UPS is not present, then the ibm,power-off-ups RTAS call must turn off system power while enabling platform auto restart upon restoration of system power, according to the platform_auto_power_restart policy described in , and must not return, otherwise, the call must not turn off system power and must not return. Software Implementation Notes: Supporting ibm,power-off-ups, allows a system to be shutdown due to a report that the system was running under UPS power for systems with a platform managed UPS. As opposed to power-off, ibm,power-off-ups, permits the operating system to be restarted when power is restored after a loss of external power. The report that a system needs to be shutdown due to running under a UPS would be given by the platform as an EPOW event with EPOW event modifier being given as, 0x02 = Loss of utility power, system is running on UPS/Battery, as described in section . If the RTAS ibm,power-off-ups call is supported by the platform, it will also allow a shutdown with a subsequent restart when power is restored for systems running with a UPS that is not under platform control. This presumes that the OS has some external means of recognizing when running under UPS power to initiate the ibm,power-off-ups call.

Reboot and Flash Update Calls During execution, it may become necessary to shut down processing and reboot the system in a new mode. For example, a different OS level may need to be loaded, or the same OS may need to be rebooted with different settings of System Environment Variables.

<emphasis>system-reboot</emphasis> R1--1. RTAS must implement a system-reboot call which resets all processors and all attached devices. After reset, the system must be booted with the current settings of the System Environment Variables (refer to for more information). R1--2. The RTAS system-reboot call must be implemented using the argument call buffer defined by . Argument Call Buffer <emphasis>system-reboot</emphasis> Parameter Type Name Values In Token Token for system-reboot Number Inputs 0 Number Outputs 1 Out Status On successful operation, does not return -1: Hardware error

Hardware Implementation Note: The platform must be able to perform a system reset and reboot. On a multiprocessor system, this should be a hard reset to the processors.

<emphasis>ibm,update-flash-64-and-reboot</emphasis> The ibm,update-flash-64-and-reboot function is described in this section. It does not return to the OS if successful. This call supports RTAS instantiated in 32 bit mode to access storage at addresses above 4GB. In an exception to the LPAR Requirement this call supports block lists being outside of the Real Mode Area (RMA) as long as the initial block list is at an address below the limits of the cell size of the Block_list argument. R1--1. The argument call buffer for the ibm,update-flash-64-and-reboot RTAS call must correspond to the definition in . Argument Call Buffer <emphasis>ibm,update-flash-64-and-reboot</emphasis> Parameter Type Name Values In Token Token for ibm,update-flash-64-and-reboot Number Inputs 1 Number Outputs 1 Block_list A real pointer to a block list of 64 bit entries Out Status -1: Hardware error -3: Image unacceptable to update program -4: Programming failed when partially complete, and the flash is now corrupted - reboot may fail -9002: Not authorized The Status of 0 is never returned, because this RTAS call does not return if successful. The -1 return is to cover the case where some condition prevents RTAS from being able to program the flash at this time. For example, the flash programming power supply is disconnected, a low-level security check (for instance a switch or jumper) fails, or a test programming probe fails for an unknown reason or the case where the flash has been successfully updated, but the reboot fails for some reason. The -3 return is to cover the case where embedded vendor/platform specific information in the image failed to conform to the required format or content for this platform, such as the firmware revision number or a CRC or some other check which was intended to ensure the integrity of the image. The -4 return is to cover the case where the update failed before the image was fully updated. In this case, the OS has the responsibility for reporting the failure. The -9002 return code is used to indicate that the partition at the time the call was made was not authorized to update the flash image.

R1--2. The RTAS ibm,update-flash-64-and-reboot call Block_list on platforms that do not present the “ibm,flash-block-version” property in the OF /rtas node must conform to the definition shown in . Format of Block List Length of Block_list in bytes Address of memory area 1 Length of memory area 1 . . . Address of memory area n Length of memory area n

R1--3. The ibm,update-flash-64-and-reboot RTAS call Block_list must be a sequence of 64 bit cells. R1--4. Memory blocks referenced in the ibm,update-flash-64-and-reboot RTAS call Block_list must reside in System Memory outside that reserved for firmware (both the RTAS data area and OF’s memory defined by real-base and real-size). R1--5. The block list referenced by the Block_list argument to the ibm,update-flash-64-and-reboot RTAS call must be in System Memory below the maximum address supported by the RTAS instantiated cell size. R1--6. The addresses of memory blocks referenced by the ibm,update-flash-64-and-reboot RTAS call Block_list must align tn a 4 KB boundary. R1--7. A memory block, included in the ibm,update-flash-64-and-reboot RTAS call Block_list, must not cross a 256MB boundary. R1--8. The ibm,update-flash-64-and-reboot call must test the image to make sure it has the right format and is not damaged, update the flash from the Block_list and then perform a system reset and reboot, as for the system-reboot call. Hardware and Software Implementation Note: Platform specific information should be embedded in the flash images to identify the firmware unambiguously and to ensure that the firmware operates correctly on the platform. Such information might include platform board model and revision numbers covered by the firmware, manufacturer ID, and firmware revision number used for external display. This information should include a CRC or other check which ensures the integrity of the data. Software Implementation Notes: The execution time for this calls may be in the order of seconds, rather than “a few tens of microseconds” as noted on page . The RTAS flash update programs should display progress, completion, and error information while the flash update is underway, if possible. The OS does not expect a return from the ibm,update-flash-64-and-reboot call other than for cases where the hardware cannot be accessed, the flash image is unacceptable to the RTAS flash update program, the result of the update corrupted the flash, or the platform could not be rebooted.

Flash Update with Discontiguous Block Lists The property “ibm,flash-block-version” (see ) is defined to describe the following definition and operation of the Block_list shown in . Format of Discontiguous <emphasis>Block_list</emphasis> VER Length of Block_list in bytes Address of Block_list extension Address of memory area 1 Length of memory area 1 Address of memory area 2 Length of memory area 2 - - - Address of memory area n Length of memory area n

Where: VER (1 byte in length) indicates the version of the Block_list. Length of the Block_list in bytes indicates the size of this Block_list, including the header cell and the cell with the address of the Block_list extension. Address of the Block_list extension indicates the location of the next Block_list. 0x00 indicates no additional Block_list extension. Address of memory area 1 (2 . . . n) indicates the location of this portion of the flash image. Length of memory area 1 (2 . . . n) indicates the length of this portion of the flash image. R1--1. If VER is 0x01, the Block_list must be formatted as in . R1--2. If VER is 0x01, the Block_list parameter in the function call or the Address of the Block_list extension, if not 0x00, must point to a Block_list cell containing VER and Length of the Block_list. R1--3. If VER is 0x01, the Address of the Block_list extension parameter must be 0x00 to indicate that there are no further extensions. R1--4. The VER byte must exist in the Block_list and in each Block_list extension. R1--5. If the platform supports the property “ibm,flash-block-version” with value 0x01, it must also support the default value 0x00. The Block_list format allows flexibility in the size and page requirements for the block lists. Page alignment is not required for the lists or extensions. They may run across contiguous pages with the control being the length of each list or extension and with the end being the 0x00 pointer.

<emphasis>ibm,manage-flash-image</emphasis> The ibm,manage-flash-image RTAS call supports systems having a “temporary” and “permanent” flash image areas. It allows the user to commit the temporary flash image by copying it to the permanent image area. It also allows the user to reject the temporary flash image by overwriting it with the permanent flash image. R1--1. The RTAS ibm,manage-flash-image call must be implemented using the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,manage-flash-image</emphasis> Parameter Type Name Values In Token Token for ibm,manage-flash-image Number of Inputs 1 Number of Outputs 1 Image to Commit 0: Reject “temporary” firmware image 1: Commit “temporary” firmware image Out Status 0: Success -1: Hardware Error -2: Busy -3: Parameter Error -9001: Cannot Overwrite the Active Firmware Image Error -9002 Not Authorized 990x: Extended Delay

R1--2. The ibm,manage-flash-image RTAS call must not change the system flash and must return a Status of value -9001 when called with a request to reject the temporary firmware image when not running on the permanent firmware image. R1--3. The ibm,manage-flash-image RTAS call must not change the system flash and must return a Status of value -9001 when called with a request to commit the temporary firmware image when not running on the temporary firmware image. Platform Implementation Note: In platforms supporting two firmware image areas, platforms always apply updates to the temporary image area. The RTAS call ibm,manage-flash-image is the normal means by which a temporary image is committed to the permanent side. However, if a platform is running from a temporary image when an update is to be applied, then the platform may automatically commit the current temporary image to the permanent side to allow the new image to be updated to the temporary image area. The ibm,validate-flash-image RTAS call is used to determine what would result from an attempt to update a FLASH image taking in to account the image to be updated and the current image being executed.

<emphasis>ibm,validate-flash-image</emphasis> The ibm,validate-flash-image RTAS call allows OS service code to determine if a candidate flash image is valid, if the partition has authority to update the flash image, and what the resulting flash levels will be after performing the update. R1--1. The ibm,validate-flash-image RTAS call must be implemented using the argument call buffer described in . Argument Call Buffer <emphasis>ibm,validate-flash-image</emphasis> Parameter Type Name Values In Token Token for ibm,validate-flash-image Number of Inputs 2 Number of Outputs 2 Buffer Ptr Real address of minimum 4 K buffer, contiguous in real memory Buffer Size Size in bytes of Buffer Out Status 990x: Extended Delay 0: Success -1: Hardware Error -2: Busy -3: Parameter Error -9002: Not authorized Update Results Token Token to identify what will happen if update is attempted with this token, described in Requirement .

R1--2. The ibm,validate-flash-image RTAS call Buffer Ptr parameter must be a real address representing the starting address of a minimum 4 K buffer, contiguous in real memory. R1--3. On input, the ibm,validate-flash-image RTAS call buffer pointed to by the Buffer Ptr parameter must contain the first 4 KB of the candidate flash image to be validated. R1--4. For the LPAR option: The ibm,validate-flash-image RTAS call buffer described in Requirement must be in the partition's RMA. R1--5. On exit from the ibm,validate-flash-image RTAS call, RTAS must place the following data in the buffer, starting at the address in the Buffer Ptr parameter: “MI”<sp> current-T-image <sp> current-P-image <0x0A> “MI”<sp> new-T-image <sp> new-P-image <0x00> “ML”<sp> current-T-image <sp> current-P-image <0x0A> “ML”<sp> new-T-image <sp> new-P-image <0x00> “MG”<sp>current-T-img-ga-date<sp>current-P-img-ga-date<0x0A> “MG”<sp>new-T-img-ga-date<sp>new-P-img-ga-date<0x0A> “MG”<sp>input-image-ga-date<0x0A> “ME”<sp>fw-service-entitlement-expiration-date<0x00> In Requirement , current-T-image and current-P-image are the fixpack microcode image names currently on the Temporary and Permanent sides, respectively, and new-T-image and new-P-image are the fixpack microcode image names that will exist in flash after a successful flash update with the candidate image. If the current flash image level is not known, the value provided for current-T-image and/or current-P-image is “UNKNOWN”. If the flash update function would not succeed, the values of new-T-image and new-P-image are the same as current-T-image and current-P-image, respectively. R1--6. On exit from the ibm,validate-flash-image RTAS call, the Update Results Token output must be updated with one of the values in . This list is in order; firmware must provide the first value in the list which would be true if an update is attempted: Update Results Token Values Token Description 1 No update done, partition does not have authority to perform flash update 2 No update done, the candidate image is not valid for this platform 3 Current fixpack level is unknown, the new-T-image and new-P-image identifies show what will exist in flash after update with this image 4 Current T side will be committed to P side before being replace with new image, and the new image is downlevel from current image 5 Current T side will be committed to P side before being replaced with new image 6 T side will be updated with a downlevel image 7 No update done, the candidate image's release date is later than the system's firmware service entitlement date - service warranty period has expired 0 T side will be updated with a newer or identical image

<emphasis>ibm,activate-firmware</emphasis> The ibm,activate-firmware allows an OS to activate a new version of firmware that has been updated in the platform flash memory after the partition was started. R1--1. The ibm,activate-firmware RTAS call must be implemented using the argument call buffer described in . Argument Call Buffer <emphasis>ibm,activate-firmware</emphasis> Parameter Type Name Values In Token Token for ibm,activate-firmware Number of Inputs 0 Number of Outputs 1 Out Status 990x: Extended Delay 0: Success -1: Hardware Error -2: Busy, try again later -3: Parameter Error -9001: No valid FW available to activate

Software implementation Note: The OS should expect that a number of calls may be required to accomplish firmware activation, with “Busy, try again later” or “Extended Delay” return codes from all but the last call. The new version of firmware is not in use until a “Success” return. The OS may interleave calls to other RTAS functions between calls to this function.

SMP Support In a Symmetric Multiprocessor (SMP) system, the platform needs the ability to synchronize the clocks on all the processors. The timebase registers are synchronized by the platform before CPUs are given to the OS. R1--1. (Merged into Requirement ) R1--2. (Merged into Requirement )

<emphasis>stop-self</emphasis> The stop-self primitive causes a processor thread to stop processing OS or user code, and to enter a state in which it is only responsive to the start-cpu RTAS primitive. This is referred to as the RTAS stopped state. R1--1. A stop-self RTAS call must place the calling processor thread in the RTAS stopped state. This call must be implemented using the argument call buffer defined by . R1--2. RTAS must insure that a processor thread in the RTAS stopped state does not checkstop or otherwise fail if a machine check or soft reset exception occurs. Processor threads in this state receive the exception, but must perform a null action and remain in the RTAS stopped state. Argument Call Buffer <emphasis>stop-self</emphasis> Parameter Type Name Values In Token Token for stop-self Number Inputs 0 Number Outputs 1 Out Status If successful, this call does not return -1: Hardware Error

Software Implementation Note: If this call succeeds, it does not return. The CPU thread waits for some other processor thread to issue a start-cpu targeted to this processor thread. Firmware Implementation Note: In an LPAR environment the state of the interrupt sub-system associated with this processor on entry to this call cannot be trusted. Although interrupts are masked as part of the RTAS call protocol, the caller may have left the processor configured as an interrupt server. Therefore, interrupt signals may be pending within the processor’s interrupt management area. These conditions need to be cleared prior to allocating this processor to another partition. R1--3. Platforms which support the enhanced stop-self RTAS behavior must include the name only “ibm,integrated-stop-self” OF property, under the /rtas node, and prior to placing a processor in the stopped state, flush and disable any caches/memory exclusively used by the issuing processor. Architecture Note: In Requirement , an exclusively used cache means that no other running processor currently needs the cache for normal operation, even if the cache could potentially be shared with other processors. An exclusively used memory means any main memory allocated local to the processor thread and thus not accessible by other processor threads. R1--4. Execution of the stop-self call by the last active processor thread must cause the firmware to recover all the resources owned by the executing OS image for use per platform policy.

<emphasis>start-cpu</emphasis> The start-cpu primitive is used to cause a processor thread which is currently in the RTAS stopped state to start processing at an indicated location. R1--1. A start-cpu RTAS call must remove the processor thread specified by the CPU_id parameter from the RTAS stopped state. This call must be implemented using the argument call buffer defined by . R1--2. The processor thread specified by the CPU_id parameter must be in the RTAS stopped state entered because of a prior call by that processor to the stop-self primitive. R1--3. When a processor thread exits the RTAS stopped state, it must begin execution in real mode, with the MSR in the same state as from a system reset interrupt (except for the MSRHV bit which is on if not running under a hypervisor and off if running under a hypervisor) at the real location indicated by the Start_location parameter, with register R3 set to the value of parameter Register_R3_contents and the MSR as defined in . All other register contents are indeterminate. Argument Call Buffer <emphasis>start-cpu</emphasis> Parameter Type Name Values In Token Token for start-cpu Number Inputs 3 Number Outputs 1 Cpu_id Token identifying the processor thread to be started, obtained from the value of the “ibm,ppc-interrupt-server#s” property for the CPU in the OF device tree Start_location Real address at which the designated CPU begins execution Register_R3_contents Value which is loaded into Register R3 before beginning execution at Start_location Out Status 0: Success -1: Hardware Error 3: Not enough resources available to start CPU

Note: Requirement applies to the start-cpu RTAS call. At the completion of start-cpu, the caches to be used by the specified processor must have been initialized and the state bits made accurate prior to beginning execution at the start address. Machine State Register (MSR) State in Started Processor Bit Number Name Initial Value upon start by start-cpu Bit Number Name Initial Value upon start by start-cpu 0 SF 0 if RTAS instantiated in 32 bit mode 1 if RTAS instantiated in 64 bit mode 53 SE 0 1 Reserved 0 54 BE 0 2 Reserved 0 55 FE1 Implementation Dependent 3 HV 0 if running under hypervisor firmware, 1 if running in “SMP” mode 56 US 0 4:46 Reserved 0 57 Reserved 0 47 ILE 0 58 IR 0 48 EE 0 59 DR 0 49 PR 0 60 Reserved 0 50 FP Implementation Dependent 61 PMM 0 51 ME 1 62 RI 0 52 FE0 Implementation Dependent 63 LE 0

<emphasis>query-cpu-stopped state</emphasis> The query-cpu-stopped-state primitive is used to query a different processor thread to determine its status with respect to the RTAS stopped state. R1--1. A query-cpu-stopped-state RTAS call must return the CPU_status of the processor thread specified by the Cpu_id parameter. This call must be implemented using the argument call buffer defined by . Argument Call Buffer <emphasis>query-cpu-stopped-state</emphasis> Parameter Type Name Values In Token Token for query-cpu-stopped-state Number Inputs 1 Number Outputs 2 Cpu_id Token identifying the processor thread to be queried, obtained from the value of the “ibm,ppc-interrupt-server#s” property for the CPU in the OF device tree Out Status 0: Success -1: Hardware Error -2: Hardware Busy, Try again later CPU_status 0: The processor thread is in the RTAS stopped state 1: stop-self is in progress 2: The processor thread is not in the RTAS stopped state

Firmware Implementation Note: RTAS serialization may be required between the stop-self and the query-cpu-stopped-state calls. Software Implementation Note: The OS performs a stop-self on the desired processor thread, then periodically call s query-cpu-stopped-state on another processor thread until the desired processor thread is stopped. Before calling set-power-level to power off the desired processor, or isolate the logical CPU, the platform requires that all processor threads be in the RTAS stopped state.

Miscellaneous RTAS Calls

<emphasis>ibm,os-term</emphasis> This RTAS call is provided for the OS to indicate to the platform that it has terminated normal operation. A string of information is passed to the platform. A call to the ibm,os-term RTAS function implies the following to the platform: Any platform reporting and recovery policies may now take effect. The OS may no longer be issuing periodic event-scan requests, so surveillance monitoring does not continue. All devices not marked “used-by-rtas” are released by the OS (including, for example, native serial ports used by a service processor). The OS no longer responds to any EPOW events, so it is up to the platform to take any appropriate actions for such events. Due to the above implications, the platform may take actions (for example, a service processor “call home”) that could conflict with normal processing of further RTAS requests. However, since the OS has entered a “live halt” state, the list of RTAS functions that it still needs is relatively small. The list of RTAS functions that the platform might expect to see after ibm,os-term includes: nvram-fetch nvram-store display-character power-off ibm,power-off-ups system-reboot check-exception for machine checks (Although the OS may still react normally to a machine check condition by calling check-exception, it might not process a returned error log. It is allowable for check-exception to not return an extended log when in this state.) If a platform has a service processor, and a policy has been established for actions to be taken by the service processor upon receiving notice of OS termination, the service processor may complete those actions and a return to the CPU from this call may never occur. If the call does return, the OS performs its own termination policy. When the platform supports extended ibm,os-term behavior, the return to the RTAS will always occur unless there is a kernel assisted dump active as initiated by an ibm,configure-kernel-dump call. Platforms capable of supporting this extended ibm,os-term behavior will so indicate by presenting the “ibm,extended-os-term” RTAS property in the OF device tree. R1--1. RTAS must implement an ibm,os-term call using the argument call buffer defined by to receive a termination string from the OS. Argument Call Buffer <emphasis>ibm,os-term</emphasis> Parameter Type Name Values In Token Token for ibm,os-term Number Inputs 1 Number Outputs 1 Pointer to String NULL terminated string Out Status 0:success -1:hardware error -2:hardware busy, try again later

Platform Implementation Note: The string should be maintained in an error log which could be made accessible to a service location or saved in the platform for later remote access. R1--2. The ibm,os-term call must disable surveillance. R1--3. During the machine check and soft reset handlers, the platform must support access to the ibm,os-term RTAS function. R1--4. If the ibm,os-term call does not return to the caller, the platform must honor the partition_auto_restart system parameter value. R1--5. For platforms supporting extended ibm,os-term behavior, the ibm,os-term call must always return unless there is an active kernel assisted dump configured as specified by an ibm,configure-kernel-dump RTAS call. Platform implementation note: The ibm,os-term RTAS call allows for the case where the OS and platform may share an I/O device such as a TTY where the OS would have use of the device normally, and the platform use when the OS has terminated, such as to implement an error reporting call home function both in the OS and the platform. For proper sharing in such a case where extended behavior is supported, when the primary partition console is also used for the call-home by the platform, the platform should not initiate the call home until after the partition shuts down.

<emphasis>Ibm,exti2c</emphasis> For support of platforms which require an external I2C bus, a special port to the service processor is required. The EXTI2C option is designed to control specific external devices. Designers cannot assume that an arbitrary I2C device may be substituted. The ibm,exti2c call provides a single channel to the I2C bus. Through this channel, software can read or write up to 256 bytes from/to an addressed resource within an address space between X’000000 and X’FFFFFF. Reference the specification for the specific I2C device to determine what effect such operations may have. The Buffer Pointer argument is used to manage this channel across multiple ibm,exti2c RTAS calls. If the input Buffer Pointer value on a call is zero, the state of the channel is reset and any outstanding I2C operation is aborted. If the input Buffer Pointer has a different value from that of the last call, a new operation is started, with any previous operation being aborted. An input Buffer Pointer value that is the same as that used on the previous call indicates a continuation of the last operation, given that the Status of the last call was not 0 (success) or -1 (hardware error). These terminating statuses reset the channel. Using software must manage serialization to the ibm,exti2c channel across multiple calls for the same I2C operation. A single ibm,exti2c operation may require an extended period of processing by background hardware. During this time, RTAS returns either a Status code of -2 or 990x. A Status of -2 indicates that RTAS may be capable of doing useful processing immediately. A Status code of 990x indicates that the platform requires an extended period of time to perform the operation. It is suggested that software delay for 10 raised to the x milliseconds before calling ibm,exti2c with the same Buffer Pointer value, however, software may call again earlier or later. A Status code of -1 indicates either a general error associated with the local I2C hardware (service processor) or that the channel has been corrupted due to other error conditions not associated with the I2C operation. If the buffer is changed, as when an error code is returned, the RTAS Status code is 0 (success). R1--1. For the EXTI2C option: RTAS must implement an ibm,exti2c call using the argument call buffer defined by to allow communications with special hardware. ibm,exti2c Argument Call Buffer Parameter Type Name Values In Token Token for ibm,exti2c Number Inputs 1 Number Outputs 1 Buffer Pointer Real Address of data buffer Out Status 0:Success -1:hardware error -2:hardware busy, try again later -3: Parameter error 990x:Extended delay where x is a number 0-5

R1--2. For the EXTI2C option: The Buffer Pointer must point to a contiguous real storage area large enough to contain the I2C command and any associated data (maximum of 261 bytes). R1--3. For the EXTI2C option: The Buffer format for the write operation must be as defined in . EXTI2C Buffer Write Operation Format Condition Byte # Content On Call 0 0x00 1-3 Address of I2C resource 4 Length of op. (1-255 with 0 specifying 256) 5... Data On Return - I2C OK 0 Buffer unmodified 1-3 4 5... On Return - I2C error 0 0x01 1-3 Address of I2C resource 4 I2C operation error code

Firmware and Software Implementation Note: When the ibm,exti2c RTAS call write operation returns after the operation has been enqueued by the firmware but prior to completion by the hardware (therefore the operation status is truly not known), the ibm,exti2c RTAS call can return a Status of 0 (success) with the buffer unmodified. R1--4. For the EXTI2C option: The Buffer format for a read operation, if supported, must be as defined in . EXTI2C Buffer Read Operation Format (Optional) Condition Byte # Content On Call 0 0x80 1-3 Address of I2C resource 4 Length of op. (1-255 with 0 specifying 256) 5... Data On Return - I2C OK 0 0x80 1-3 Address of I2C resource 4 Length of op. (1-255 with 0 specifying 256) 5... Data On Return - I2C error 0 0x81 1-3 Address of I2C resource 4 I2C operation error code

R1--5. For the EXTI2C option: If read operations are not supported and a read operation is attempted, then the platform must return a Status of -3. R1--6. For the EXTI2C option: The maximum total Extended Delay imposed by the ibm,exti2c command for a single I2C operation must be less than 2 seconds. R1--7. For the EXTI2C option: When the ibm,exti2c RTAS call returns an EXTI2C buffer containing an I2C operation error code, the RTAS Status code must be 0 (success). Firmware and Software Implementation Note: When the ibm,exti2c RTAS call returns after the operation has been enqueued by the firmware but prior to completion by the hardware (therefore the operation status is truly not known), the ibm,exti2c RTAS call can return a Status of 0 (success) with the buffer unmodified.

PowerPC External Interrupt Option The RTAS calls used to access the facilities of the PowerPC External Interrupt option need not be serialized by the calling OS. Other RTAS rules such as being called in real mode with interrupts disabled still apply. Note: These RTAS calls make the PowerPC External Interrupt option Logical Partition (LPAR) ready.

<emphasis>ibm,get-xive</emphasis> R1--1. For the PowerPC External Interrupt option: RTAS must implement an ibm,get-xive call using the argument call buffer defined by . R1--2. For the PowerPC External Interrupt option: The ibm,get-xive call must be reentrant to the number of processors on the platform. R1--3. For the PowerPC External Interrupt option: The ibm,get-xive argument call buffer for each simultaneous call must be physically unique. R1--4. For the PowerPC External Interrupt option: The ibm,get-xive call must return the current values of the server number and priority fields, as set by the last ibm,set-xive call (priority initialized to least favored level by firmware at boot), of the External Interrupt Vector Entry associated with the interrupt number provided as an input argument unless prevented by Requirement . R1--5. For the PowerPC External Interrupt option: The ibm,get-xive call must return the Status of -3 (Argument Error) for an unimplemented Interrupt # (not reported via an “interrupt-ranges” property). R1--6. For the PowerPC External Interrupt option combined with the Platform Reserved Interrupt Priority Level option: The ibm,get-xive call must return the Status of -3 (Argument Error) for an platform reserved interrupt priority (reported via an the “ibm,plat-res-int-priorities” property). Argument Call Buffer <emphasis>ibm,get-xive</emphasis> Parameter Type Name Values In Token Token for ibm,get-xive Number Inputs 1 Number Outputs 3 Interrupt # From “interrupt-ranges” property Out Status 0: Success -1: Hardware Error -3: Argument Error (Optional) Server # 0x0 - 2 “ibm,interrupt-server#-size” Priority 0x0 - 0xff

<emphasis>ibm,set-xive</emphasis> R1--1. For the PowerPC External Interrupt option: RTAS must implement an ibm,set-xive call using the argument call buffer defined by . R1--2. For the PowerPC External Interrupt option: The ibm,set-xive call must be reentrant to the number of processors on the platform. R1--3. For the PowerPC External Interrupt option: The ibm,set-xive argument call buffer for each simultaneous call must be physically unique. R1--4. For the PowerPC External Interrupt option: The ibm,set-xive call must set values of the server number and priority fields of the External Interrupt Vector Entry (XIVE) and/or firmware saved priority value (if the interrupt source controller does not use an interrupt Enable Register and the interrupt source is masked off, either due to a previous ibm,int-off call or because the interrupt source was never enabled with an ibm,int-on call since boot), associated with the interrupt number provided as an input argument unless prevented by Requirement . R1--5. For the PowerPC External Interrupt option: The ibm,set-xive call must return the Status of -3 (Argument Error) for an unimplemented Interrupt number. R1--6. For the PowerPC External Interrupt plus the Platform Reserved Interrupt Priority Level option: The ibm,set-xive call must return the Status of -3 (Argument Error) for an reserved Priority value (as reported via an “ibm,plat-res-int-priorities” property). Argument Call Buffer <emphasis>ibm,set-xive</emphasis> Parameter Type Name Values In Token Token for ibm,set-xive Number Inputs 3 Number Outputs 1 Interrupt # Interrupt number from appropriate OF device tree property Server # 0x00 - 2 “ibm,interrupt-server#-size” Priority 0x00 - 0xFF Out Status 0: Success -1: Hardware Error -3 Argument Error (Optional)

<emphasis>ibm,int-off</emphasis> R1--1. For the PowerPC External Interrupt option: RTAS must implement an ibm,int-off call using the argument call buffer defined by . R1--2. For the PowerPC External Interrupt option: The ibm,int-off call must be reentrant to the number of processors on the platform. R1--3. For the PowerPC External Interrupt option: The ibm,int-off argument call buffer for each simultaneous call must be physically unique. R1--4. For the PowerPC External Interrupt option: The ibm,int-off call must disable interrupts from the interrupt source associated with the interrupt number provided as an input argument unless prevented by Requirement . R1--5. For the PowerPC External Interrupt option: If the interrupt source controller uses an Interrupt Enable Register, the ibm,int-off call must reset the mask bit associated with the specified interrupt number; or if the interrupt source controller does not use an interrupt Enable Register, the ibm,int-off call must save the priority value of the XIVE for later restoration by the ibm,int-on call, or presentation by the ibm,get-xive call and set the priority value of the XIVE to the least favored priority value (0xFF), unless prevented by Requirement . R1--6. For the PowerPC External Interrupt option: The ibm,int-off call must return the Status of -3 (Argument Error) for an unimplemented Interrupt number. Argument Call Buffer <emphasis>ibm,int-off</emphasis> Parameter Type Name Values In Token Token for ibm,int-off Number Inputs 1 Number Outputs 1 Interrupt # Interrupt number from appropriate OF device tree property Out Status 0: Success -1: Hardware Error -3 Argument Error (Optional)

<emphasis>ibm,int-on</emphasis> R1--1. For the PowerPC External Interrupt option: RTAS must implement an ibm,int-on call using the argument call buffer defined by . R1--2. For the PowerPC External Interrupt option: The ibm,int-on call must be reentrant to the number of processors on the platform. R1--3. For the PowerPC External Interrupt option: The ibm,int-on argument call buffer for each simultaneous call must be physically unique. R1--4. For the PowerPC External Interrupt option: The ibm,int-on call must enable interrupts from the interrupt source associated with the interrupt number provided as an input argument unless prevented by Requirement . R1--5. For the PowerPC External Interrupt option: If the interrupt source controller uses an Interrupt Enable Register, the ibm,int-on call must set the mask bit associated with the specified interrupt number; or if the interrupt source controller does not use an interrupt Enable Register, the ibm,int-on call must restore the XIVE priority value saved by the previous ibm,int-off call (initialized by the firmware to the least favored level at boot) unless prevented by Requirement . R1--6. For the PowerPC External Interrupt option: The ibm,int-on call must return the Status of -3 (Argument Error) for an unimplemented Interrupt number. Argument Call Buffer <emphasis>ibm,int-on</emphasis> Parameter Type Name Values In Token Token for ibm,int-on Number Inputs 1 Number Outputs 1 Interrupt # Interrupt number from appropriate OF device tree property Out Status 0: Success -1: Hardware Error -3 Argument Error (Optional)

MSI Support This section describes the RTAS calls required when the MSI option is implemented. See for other platform requirements for the MSI option. The Message Signaled Interrupt (MSI) and Enhanced MSI (MSI-X) capability of PCI IOAs in many cases allows for greater flexibility in assignment of external interrupts to IOAs than the predecessor Level Sensitive Interrupt (LSI) capability, and in some cases allows the treatment of MSIs This architecture will refer generically to the MSI and MSI-X capabilities as simply “MSI,” except where differentiation is required. as a resource pool that is reassigned based on availability of MSIs and the need of an IOA function for more interrupts than initially assigned. Platforms that implement the MSI option implement the ibm,change-msi and ibm,query-interrupt-source-number RTAS calls.

<emphasis>ibm,change-msi</emphasis> The OS uses the ibm,change-msi RTAS call to query the initial number of MSIs assigned to a PCI configuration address (that is, to an IOA function) and to request a change in the number of MSIs assigned, when the platform allows for dynamic reassignment of MSIs for the IOA function. The ibm,change-msi RTAS call allows the caller to allow the platform to select MSI or MSI-X, to specifically select MSI or MSI-X or, if LSIs are allocated by the firmware for the IOA function, to change to LSI (by removal of the MSIs assigned). The interrupt source numbers assigned to an IOA function are queried via the ibm,query-interrupt-source-number RTAS call. The ibm,query-interrupt-source-number RTAS call is called iteratively, once for each interrupt assigned to the IOA function. The interrupt source numbers returned by the ibm,query-interrupt-source-number RTAS call are the numbers used to control the interrupt as in the ibm,get-xive, ibm,set-xive, ibm,int-on, and ibm,int-off RTAS calls. If a device driver is willing to live with the platform-assigned initial number of MSIs, then the device driver does not need to use the ibm,change-msi RTAS call, and can instead use the ibm,query-interrupt-source-number RTAS call to determine the number of interrupts assigned to each IOA function. An OS may abandon the effort to change the MSIs for a given configuration address after the first call to ibm,change-msi and prior to a call which gets a status back indicating completion, by calling again with the same PCI configuration address but with a Function number of 2 (set to default number of interrupts) and a Sequence Number of 1. RTAS never returns a Status of -2 or 990x when the call is made with a Function number of 2. If an OS successfully changes the number of interrupts, then it should consider removing the increase when it deconfigures the IOA function, especially if it starts with zero and wants to be backward compatible with older device drivers that may not understand MSIs. To remove all MSIs, set the Requested Number of Interrupts to zero. But it should be noted, that once set to zero, there is no guarantee that on a future request there will be any MSIs available to assign from the pool. Adding MSIs to an IOA function which has LSIs assigned disables those LSIs but does not remove them, and then removing the MSIs that replaced the LSIs re-uses the same (previously removed) LSIs (mapped to the same LSI source numbers as the previous LSI source numbers). The presence of the “ibm,change-msix-capable” property specifies that the platform implements the version of this RTAS call that allows Number Outputs equal to 4 and Functions 3, 4 and 5. If the ibm,change-msi RTAS call is made with Number Outputs equal to 4 or with Function equal to 3 or 4 when the “ibm,change-msix-capable” property does not exist in the /rtas node, then the call will return a Status of -3 (Invalid Parameter). Specifying Function 3 (MSI) also disables MSI-X for the specified IOA function, and likewise specifying Function 4 (MSI-X) disables MSI for the IOA function. It is unnecessary to specify a Requested Number of Interrupts of zero when switching between MSI and MSI-X. Specifying the Requested Number of Interrupts to zero for either Function 3 or 4 removes all MSI & MSI-X interrupts from the IOA function. It is permissible to use LSI, MSI and MSI-X on different IOA functions. The default (initial) assignment of interrupts is defined in . R1--1. For the MSI option: The platform must implement the ibm,change-msi call using the argument call buffer defined by . <emphasis>ibm,change-msi</emphasis> Argument Call Buffer Parameter Type Name Values In Token Token for ibm,change-msi. Number Inputs 6 Number Outputs 3 or 4, when the “ibm,change-msix-capable” property is present, 3 otherwise. Config_addr Configuration Space Address (Register field set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Function Determines action of this call: 0: Query only (only return actual number of MSI or MSI-X interrupts assigned to the PCI configuration address). 1: If Number Outputs is equal to 3, request to set to a new number of MSIs (including set to 0). If the “ibm,change-msix-capable” property exists and Number Outputs is equal to 4, request is to set to a new number of MSI or MSI-X (platform choice) interrupts (including set to 0). 2: Request to set back to the default number of interrupts (also aborts a change in progress; that is, one that has previously returned a Status of -2 or 990x) 3: (Only valid if “ibm,change-msix-capable” exists): Request to set to a new number of MSIs (including set to 0) 4: (Only valid if “ibm,change-msix-capable” exists): Request to set to a new number of MSI-X interrupts (including set to 0) 5: (Only valid if “ibm,change-msix-capable” exists): Request to set to a new number of 32 bit MSI (including set to 0) disregarding the adapter capability to support 64 bit MSI. Requested Number of Interrupts The total number of MSIs being requested for the PCI configuration address. A value of 0 is specified in order to remove all MSIs for the PCI configuration address. This input parameter is ignored by RTAS for Function values other than 1, 3, 4 or 5. Sequence Number Integer representing the sequence number of the call. First call in sequence starts with 1, following calls (if necessary) use the Next Sequence Number returned from the previous call. Out Status -3: Parameter error -2: Call again -1: Hardware error 0: Success 990x: Extended Delay Final Number of Interrupts Number of interrupts assigned to the PCI configuration address at the successful completion of this call ( Status of 0). For Function 1, 3, or 4, if a greater number was requested than what was previously assigned, the final number may be less than what was requested, even though a Status of 0 is returned. Next Sequence Number Integer to be returned as the Sequence Number parameter on the next call. This output is only valid if a Status of -2 or 990x is returned. Type of Interrupts This field is only valid when the Final Number of Interrupts is non-zero. 1: MSI 2: MSI-X

R1--2. For the MSI option: The Final number of Interrupts and Type of Interrupts must be valid when the platform returns a Status of 0 (Success), regardless of whether the original number and final number of interrupts assigned is different and regardless of whether or not the platform allows MSI resources to be reassigned for the specified PCI configuration address. R1--3. For the MSI option: The platform must return a Status of -3 (Parameter error) from ibm,change-msi, with no change in interrupt assignments if the PCI configuration address does not support MSI and Function 3 was requested (that is, the “ibm,req#msi” property must exist for the PCI configuration address in order to use Function 3), or does not support MSI-X and Function 4 is requested (that is, the “ibm,req#msi-x” property must exist for the PCI configuration address in order to use Function 4), or if neither MSIs nor MSI-Xs are supported and Function 1 is requested. R1--4. For the MSI option: If there are zero MSIs assigned to the target IOA function but there is one or more LSIs assigned, then a call to ibm,change-msi which successfully changes the number of MSIs assigned to non-zero must also disable the LSIs in the IOA function’s configuration space and must keep the LSI platform resources available to the IOA function in the case the MSIs are removed (see Requirement ). R1--5. For the MSI option: If there are a non-zero number of MSIs assigned to the target IOA function and if that IOA function originally had some LSIs assigned, then a call to ibm,change-msi which successfully changes the number of MSIs assigned to zero must also reassign any LSIs that were originally assigned to that IOA function, using the same interrupt number that was originally assigned (that is, the platform must reserve an originally assigned LSI for a IOA function in case it needs to reassign it), and must enable the LSIs in the IOA function’s configuration space. R1--6. For the MSI option: If the platform supports the changing of MSIs, then it must support the reduction in the number of interrupts by the ibm,change-msi call, including setting the number of MSIs to 0. R1--7. For the MSI option: On the first call of ibm,change-msi, the Sequence Number must be a 1. R1--8. For the MSI option: If ibm,change-msi returns a Status of -2 (Call again) or 990x (Extended Delay), then the caller must provide on the next call to ibm,change-msi, one of the following: All input parameters the same as the initial call except with the Sequence Number set to the value in Next Sequence Number returned from the previous ibm,change-msi call. All input parameters the same as the initial call except with Function set to 2 and the Sequence Number set to 1, if the caller is wanting to abort the previously started ibm,change-msi operation. R1--9. For the MSI option: If the ibm,change-msi RTAS call returns something other than 0 for the Final Number of Interrupts, then the ibm,query-interrupt-source-number RTAS call must be used to get the current interrupt source numbers, even if the ibm,change-msi call has returned the same number of interrupts as before the call. R1--10. For the MSI option: Firmware must not return a Status of -2 or 990x when the Requested Number of Interrupts is set to 0 or for Function 0 (query only) or for Function 2 (set back to default number). R1--11. For the MSI option: When the set-indicator RTAS call is made to isolate an IOA (for both DLPAR and PCI Hot Plug operations), the platform must release any additional MSI numbers that were obtained through the ibm,change-msi RTAS call and make them available for use by other ibm,change-msi calls. R1--12. For the MSI option: An OS or device driver that is calling ibm,change-msi for the purpose of changing the number or type of interrupts for the IOA function must assure that the IOA function cannot be actively performing operations that will generate interrupts during the process of changing the number or type of interrupts. R1--13. For the MSI option: The platform must restore the IOA’s MSI configuration space after a reset operation which occurs following boot, to what it was previous to the reset operation, and provide the same MSI assignments through the reset operation, unless a DR isolate/unisolate operation has been performed (in which case the IOA's MSI configuration space is set as it would at boot time). Software Implementation Notes: Interrupt source numbers for MSIs are not necessarily be assigned contiguously. MSIs and MSI source numbers are not shared (see Requirement ). For a multi-function IOA, the ibm,change-msi call is called for each function for which the number of MSIs is to be changed. When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling ibm,change-msi again. However, software may issue the ibm,change-msi call again either earlier or later than this. During a sequence of calls which return -2 or 990x, software may abort at any time by setting the Function equal to 2 and the Sequence Number to 1. When there is a non-zero number of MSI or MSI-X interrupts assigned, and when software attempts to change the type of interrupts (MSI to MSI-X interrupt or MSI-X to MSI) at the same time as changing the number of interrupts, the platform may return the same number of interrupts as previously assigned, even though a greater number is available. In this case a second call to ibm,change-msi to increase the number of interrupts may produce a greater number of interrupts. The platform will return a status -2 or 990x only when the OS indicates support. The OS indicates support via ibm,client-architecture-support, vector 4. See section on "Root Node Methods" for more information.

<emphasis>ibm,query-interrupt-source-number</emphasis> The ibm,query-interrupt-source-number RTAS call is used to query the interrupt source number and type (level sensitive for LSIs, edge triggered for MSIs) for a specific PCI IOA function’s interrupt, if one exists. That is, for a given PCI configuration address ( PHB_Unit_ID_Hi, PHB_Unit_ID_Low, and config_addr) and function interrupt number. This call is issued once for each interrupt of each IOA function, in order to obtain the interrupt source number and type for that interrupt. For example, if the ibm,change-msi RTAS call has previously returned a value of “n” interrupts for the IOA function, then the call is made “n” times for that function (with a relative interrupt number of 0 to n-1). R1--1. For the MSI option: The platform must implement the ibm,query-interrupt-source-number RTAS call using the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,query-interrupt-source-number</emphasis> Parameter Type Name Values In Token Token for ibm,query-interrupt-source-number Number Inputs 4 Number Outputs 3 Config_addr Configuration Space Address (Register field set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr IOA Function Interrupt Number The relative number of the interrupt within the PCI configuration address, with a value of 0 being the first interrupt of the PCI configuration address. Out Status -3: Parameter error -1: Hardware error 0: Success 1: No interrupt assigned for the given PCI configuration address and IOA Function Interrupt Number. Interrupt Source Number The interrupt source number corresponding to the PCI configuration address and IOA Function Interrupt Number, when a Status of 0 is returned. Undefined for other Status values . Interrupt Source Trigger The interrupt source trigger corresponding to the PCI configuration address and IOA Function Interrupt Number, when a Status of 0 is returned. Undefined for other Status values . 0: Level sensitive 1: Edge triggered

R1--2. For the MSI option: The interrupt source numbers returned by the ibm,query-interrupt-source-number RTAS call must be the numbers used to control the interrupt as in the ibm,get-xive, ibm,set-xive, ibm,int-on, and ibm,int-off RTAS calls. R1--2. For the MSI option: The ibm,query-interrupt-source-number RTAS call must return a Status of 1 (no interrupt assigned) if the inputs specify a valid PCI configuration address and the PCI configuration address does not have an interrupt assigned for the specified IOA Function Interrupt Number. Software and Firmware Implementation Note: Software may use the ibm,query-interrupt-source-number RTAS call for all IOA Function Interrupt Number values starting at 0 until a Status of 1 is returned, rather than using ibm,change-msi Function 0 (query). That is, the ibm,query-interrupt-source-number RTAS call works even when the “ibm,req#msi” property does not exist for the IOA (that is even when the IOA is not requesting one or more MSIs), This might be desirable, for example, if software never plans on using the capability to change the number of MSIs, and therefore does not have any other use for the ibm,change-msi call.

Enhanced I/O Error Handling (EEH) Option Functions The EEH option requires several additional RTAS calls. In addition, the Error Injection option RTAS calls are required to be implemented, in order to be able to test device driver code that implements recovery based on the EEH option. See also, , for additional information about implementing EEH error recovery. R1--1. For the EEH option: The IOA bus error injection function of the Error Injection option RTAS call must be implemented concurrently with the EEH option (that is, the ioa-bus-error token must exist in the “ibm,errinjct-tokens” property). R1--2. For the EEH option: If the EEH option is implemented for the specified PE configuration address, then calls to the ibm,set-eeh-option, ibm,set-slot-reset, and ibm,slot-error-detail RTAS calls must be governed by , otherwise if one of the invalid transitions in is attempted, then return a Status as defined by . R1--3. If the EEH option is not implemented for the specified PE configuration address and a call is made to one of the ibm,set-eeh-option, ibm,set-slot-reset, or ibm,slot-error-detail RTAS calls, then return a Status of -3 (parameter error). Software Implementation Note: Some transitions in are made asynchronously to the OS by the platform (in exceptional cases; see table for details). If software receives a Status of -7 (Unexpected state change) on an RTAS call which is attempting to change state in , then software should read the state again via the ibm,read-slot-reset-state2 RTAS call, in order to obtain the current state. Some legacy implementations may return a -1 instead of a -7. R1--4. For the EEH option: If the platform activates the reset to a PE (for example, as part of a recovery action above the PE), including the case where the platform has temporarily deactivated and then reactivated the reset, then the platform must hide such PE state transition(s) from the OS by returning a Status of 5 (PE is unavailable) with PE Unavailable Info indicating a non-zero value (temporarily unavailable) for the ibm,read-slot-reset-state2 RTAS call, until which time the required minimum reset active hold time for the hardware within the PE has been met. Software Implementation Note: Relative to the platform automatically resetting the PE as part of error recovery, as mentioned in Requirement , the PE Recovery Info output of the ibm,read-slot-reset-state2 RTAS call is provided to enable the software to determine that such a reset has occurred. R1--5. For the EEH option: If the platform deactivates the reset to a PE, except in the case where the OS has instructed it to do so with the ibm,set-slot-reset Function 0, then the platform must do all the following: Hide such a deactivation from the OS during the time that the PE reset is deactivated by returning a Status of 5 (PE is unavailable) with PE Unavailable Info indicating a non-zero value (temporarily unavailable) for the ibm,read-slot-reset-state2 RTAS call. Force OS MMIO accesses to the PE during the deactivation time to look like the PE is reset. Prevent the PE from introducing errors into the system (for example, from DMA or due to the reset being deactivated prior to the proper active hold time). Reactivate the reset, hiding the reset active hold time as required by Requirement , or force the PE into the permanently unavailable state (return a Status of 5 (PE is unavailable) with PE Unavailable Info indicating a zero value for the ibm,read-slot-reset-state2 RTAS call). R1--6. For the EEH option: The Bridged-I/O EEH option must be implemented concurrently with the EEH option. R1--7. For the EEH option: The 64 bit IOA bus error injection function of the Error Injection option RTAS call must be implemented concurrently with the EEH option (that is, the ioa-bus-error-64 token must exist in the “ibm,errinjct-tokens” property). R1--8. For the EEH option: The platform must implement the ibm,configure-pe RTAS call. PE State Transition Table Initial PE State The state as would be returned from ibm,read-slot-reset-state2, when no asynchronous platform transition has occurred. Final PE State 0 Not Reset Load/ Store Allowed Load/ Store allowed means the Loads or Stores channel is open to the PE, and not necessarily that the PE itself has its MMIO space enabled. The components within the PE also contain enable/disable bits (for example, the PCI configuration space Memory Space and IO Space enable bits in the PCI header Device Control register). DMA Allowed DMA allowed means the DMA channel is open to the PE, if the PE itself has its DMA enabled. The components within the PE also contain enable/disable bits (for example, the PCI configuration space Bus Master enable bit in the PCI header Device Control register). (Normal Operations) 1 Reset Reset may mean that the PE is being held in the reset state by a reset signal or Hot Reset (PCI Express), or that it may have been put into this state by the platform via a PCI Express Function Level Reset (FLR) in response to the ibm,set-slot-reset RTAS call. In the case of FLR, the platform makes the pulse of the FLR look like the Hot Reset case to the OS in terms of the Reset state (See for more information). Note that the platform does not monitor writes to the FLR bit of an IOA, and so OS or Device Driver writes directly to the FLR bit on an IOA will not affect the PE State as shown in . Load/ Store Disabled Load/ Store disabled means that either the PE is in the MMIO Stopped state or that the PE is reset, the latter giving the appearance of being MMIO Stopped. DMA Disabled DMA disabled means that either the PE is in the DMA Stopped state or that the PE is reset, the latter giving the appearance of being DMA Stopped. 2 Not Reset Although the current state is “Not Reset”, the PE may have been reset by the platform in the process of getting to this state. The PE Recovery Info output of the ibm,read-slot-reset-state2 RTAS call will indicate if the platform has done such a reset. Load/ Store Disabled In the MMIO Stopped state. DMA Disabled In the DMA Stopped state. 4 Not Reset Load/ Store Allowed DMA Disabled 5 Temporarily Unavailable Temporarily unavailable is signaled by a non-zero value r eturned in the PE Unavailable Info of the ibm,read-slot-reset-state2 RTAS calls. 5 Permanently Unavailable Permanently unavailable is signaled by a zero value returned in the PE Unavailable Info of the ibm,read-slot-reset-state2 RTAS calls. 0 Not Reset Load/ Store Allowed DMA Allowed (Normal Operations) ibm,set-slot-reset Function 1 or 3, or via a hardware initiated action The hardware would not normally initiate this transition; such implementation would only exist where the platform only implements EEH Stopped State via a reset (always) of the PE (that is, only implements states 0, 1, and 5 of this table), which is not applicable for LoPAR Compliant platforms. Hardware causes this state transition when EEH is enabled and an error occurs or firmware may cause due to higher level error recovery action Not a valid transition Platform initiated action ibm,slot-error-detail Function 2, or platform detected permanent error 1 Reset Load/ Store Disabled DMA Disabled ibm,set-slot-reset Function 0 The platform also removes the PE from the EEH Stopped state, if applicable, on the transition from state 1 to state 0. ibm,set-slot-reset Function 1 or 3 Not a valid transition Not a valid transition Platform initiated action ibm,slot-error-detail Function 2, or platform detected permanent error 2 Not Reset Load/ Store Disabled DMA Disabled Not a valid transition Must go through state 4 or state 1 ibm,set-slot-reset Function 1 or 3, or via a hardware initiated action ibm,set-eeh-option Function 2 Platform initiated action ibm,slot-error-detail Function 2, or hardware detected permanent error 4 Not Reset Load/ Store Allowed DMA Disabled ibm,set-eeh-option Function 3 See Requirement ibm,set-slot-reset Function 1 or 3, or via a hardware initiated action Hardware causes this state transition when EEH is enabled and an error occurs or firmware may cause due to higher level error recovery action Platform initiated action ibm,slot-error-detail Function 2, or platform detected permanent error 5 Temporarily Unavailable Not a valid transition Platform initiated action Platform initiated action This transition cannot occur if the PE was in the reset state prior to the platform transition to the temporarily unavailable state. See Requirement . . Not a valid transition ibm,slot-error-detail Function 2, or platform detected permanent error 5 Permanently Unavailable Power cycle, Partition reboot, or DLPAR re-assignment Not a valid transition Not a valid transition Not a valid transition Not a valid transition

depicts the four main functions for controlling a PE’s state: ibm,set-eeh-option Function 2 for releasing a PE from the MMIO Stopped State, when the PE is in State 2 (MMIO Stopped State and DMA Stopped State both active). ibm,set-eeh-option Function 3 for releasing a PE from the DMA Stopped State, when the PE is in State 3 (MMIO Stopped State not active and DMA Stopped State active). ibm,set-slot-reset Function 0 for releasing a PE’s reset. ibm,set-slot-reset Function 1for activating a PE’s reset. Implementation Note: In the last two bullets, above, for the case of the platform’s use of FLR for resetting the PE, the meaning of “activating” and “deactivating” the PE’s reset has slightly different meaning, but the platform makes the EEH recovery model transparent. See for more details. is a summary of the expected results of the above four operations. The -7 Status returns generally will occur for the cases where the PE state has been changed asynchronously to the OS by the platform. In these cases, software should read the state again (via the ibm,read-slot-reset-state2 RTAS call) in order to determine the current hardware state. PE State Control RTAS Call Function Result and Status Initial PE State The state as would be returned from ibm,read-slot-reset-state2, when no asynchronous platform transition has occurred. 0 Not Reset Load/ Store Allowed Load/ Store allowed means the Loads or Stores channel is open to the PE, and not necessarily that the PE itself has its MMIO space enabled. The components within the PE also contain enable/disable bits (for example, the PCI configuration space Memory Space and IO Space enable bits in the PCI header Device Control register). DMA Allowed DMA allowed means the DMA channel is open to the PE, if the PE itself has its DMA enabled. The components within the PE also contain enable/disable bits (for example, the PCI configuration space Bus Master enable bit in the PCI header Device Control register). (Normal Operations) 1 Reset Reset may mean that the PE is being held in the reset state by a reset signal or Hot Reset (PCI Express), or that it may have been put into this state by the platform via a PCI Express Function Level Reset (FLR) in response to the ibm,set-slot-reset RTAS call. In the case of FLR, the platform makes the pulse of the FLR look like the Hot Reset case to the OS in terms of the Reset state (See for more information). Note that the platform does not monitor writes to the FLR bit of an IOA, and so OS or Device Driver writes directly to the FLR bit on an IOA will not affect the PE State as shown in . Load/ Store Disabled Load/ Store disabled means that either the PE is in the MMIO Stopped state or that the PE is reset, the latter giving the appearance of being MMIO Stopped. DMA Disabled DMA disabled means that either the PE is in the DMA Stopped state or that the PE is reset, the latter giving the appearance of being DMA Stopped. 2 Not Reset Load/ Store Disabled In the MMIO Stopped state. DMA Disabled 4 Not Reset Load/ Store Allowed DMA Disabled In the DMA Stopped state. 5 Temporarily Unavailable Temporarily unavailable is signaled by a non-zero value returned in the PE Unavailable Info of the ibm,read-slot-reset-state2 RTAS calls. 5 Permanently Unavailable Permanently unavailable is signaled by a zero value returned in the PE Unavailable Info of the ibm,read-slot-reset-state2 RTAS calls. ibm,set-eeh-option Function 2 Release PE for Load/Store Result no-op no-op Transition from state 2 to 4 no-op no-op no-op Status A Status of -3 is returned instead of 0 or -7 if an invalid PCI configuration address is used. An invalid PCI configuration address is generally one which is not a PE address or which is not assigned to the OS. However, some platforms may allow resetting within the PE or outside the PE, providing this does not violate other requirements defined by this architecture. Also, some legacy implementations may return a -1 or -3 instead of a -7, but all implementations are required to implement the -7 Status, where appropriate. -3 -7 0 -3 -7 -7 Function 3 Release PE for DMA Result no-op no-op no-op Transition from state 4 to 0 no-op no-op Status -3 -7 -7 0 -7 -7 ibm,set-slot-reset Function 0 Deactivate the reset to the PE In the case of the use of FLR by the platform to reset the PE, the “activate” and “deactivate” of the reset has a different meaning than for the Hot Reset case. For FLR, the platform makes the pulse of the FLR look like the Hot Reset case to the OS in terms of the Reset state (See for more information). Result no-op Transition from state 1 to 0 The platform also removes the PE from the EEH Stopped state, if applicable, on the transition from state 1 to state 0. no-op no-op no-op no-op Status 0 0 -7 -3 -7 -7 Function 1 or 3 Activate the reset to the PE Result Transition from state 0 to 1 no-op Transition from state 2 to 1 Transition from state 4 to 1 no-op no-op Status , For Function 3, if Function 3 is not implemented, then a Status of -3 is returned. For Function 3, if implemented in the RTAS call, but not implemented for the specified PCI configuration address, then a Status of -8 is returned. In either of these cases, the PE state is not changed. If Function 3 is implemented, then the platform indicates this by the “ibm,reset-capabilities” property in the OF device tree. 0 0 0 0 -7 -7

The PE configuration address ( PHB_Unit_ID_Hi, PHB_Unit_ID_Low, and config_addr) for the domain is the PCI configuration address for the PE primary bus and is the same format as used for the ibm,read-pci-config and ibm,write-pci-config calls (see Requirement ), except that the Register field is set to 0. The PE configuration address is obtained as indicated in .

<emphasis>ibm,set-eeh-option</emphasis> This call is used to enable and disable the EEH domain of a PE, to remove a PE from the MMIO Stopped state to continue Load and Store operations to the domain, and to remove a PE from the DMA Stopped state to continue DMA operations to the domain. The PE configuration address ( PHB_Unit_ID_Hi, PHB_Unit_ID_Low, and config_addr) for the PE is obtained as defined in . R1--1. For the EEH option: RTAS must implement an ibm,set-eeh-option call using the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,set-eeh-option</emphasis> Parameter Type Name Values In Token Token for ibm,set-eeh-option Number Inputs 4 Number Outputs 1 Config_addr PE configuration address (Register fields set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Function 0: Disable EEH option for the PE (no-op for PEs with PCI Express IOAs) 1: Enable EEH option for the PE 2: Release the PE for Load/ Store operations 3: Release the PE for DMA operations 4: Enable EEH io-event interrupt for this PE 5: Disable EEH io-event interrupt for this PE Out Status 0: Success -1: Hardware Error -3: Parameter Error -7 Unexpected state change

Software and Platform Implementation Note: For platforms that enable EEH by default, ibm,set-eeh-option Function 0 (disable EEH) is a no-op. However, ibm,set-eeh-option Function 1 (enable EEH) is still required as a signalling method from the device driver to the platform that the device driver is at least EEH aware (see Requirement ). R1--2. For the EEH option: Software must use the ibm,get-config-addr-info2 RTAS call, when supported, to get the EEH domain span of the PE, otherwise software must use the ibm,read-slot-reset-state2 RTAS call in order to determine the span, and then software should attempt to perform all ibm,set-slot-reset and ibm,set-eeh-option RTAS calls appropriately, based on the EEH capabilities and as governed by . R1--3. For the EEH option: If the EEH option is implemented for the specified PE configuration address, on a call to the ibm,set-eeh-option with a Function of 0 (disable EEH) the platform must do one of the following: If any IOA in the PE is enabled (if any of the Bus Master, Memory Space or I/O Space bits in the Command Register of the IOA’s configuration space are 1), then do nothing and return a Status of 0 (Success). If the platform allows disabling of EEH and the disabling of EEH for the PE violates another requirement relative to LPAR, then the platform must not disable the EEH option for the specified PE configuration address and must return a -7 (unexpected state change) or a -1 (hardware error), with -7 preferred. If the platform allows disabling of EEH and the disabling of EEH for the PE does not violate the other requirements relative to LPAR, then clear the MMIO Stopped State and DMA Stopped State, and disable the EEH option, for the specified PE configuration address. If the default for the platform is EEH enabled, then do nothing and return a Status of 0 (Success). R1--4. For the EEH option: If the OS allows the DMA to be enabled for a PE that is in the DMA Stopped state without the use of a reset operation (that is, the use of the ibm,set-eeh-option with a Function of 3), the device driver must first do all necessary cleanup of its IOA to prevent the IOA from doing anything destructive when it starts DMA again. R1--5. For the EEH option: If a device driver is going to enable EEH and the platform has not defaulted to EEH enabled, then it must do so before it does any operations with its IOA, including any configuration cycles or Load or Store operations. R1--6. For the EEH option: If an EEH domain is enabled for a PE via the ibm,set-eeh-option RTAS call and if there are multiple IOAs or one or more multi-function IOAs in that PE, and if these functions are supported by multiple device drivers, then all of the device drivers for all the functions in that PE must be EEH enabled and be capable of coordinating EEH recovery procedures. Software implementation Note: Protection against startup errors (configuration cycles, etc.), are every bit as important as protection against errors during normal operations. Although the quantity of operations is not as great, there is more of a chance of latent errors showing up during the startup phase. R1--7. For the EEH option: If the Slot Level EEH Event Interrupt option is not implemented for the PE, then return a Status of -3 if Function 4 or 5 is attempted. R1--8. For the EEH with the Slot Level EEH Event Interrupt option: Function 4 and 5 must be implemented for all PE under nodes that contain the “ibm,io-events-capable” property.

<emphasis>ibm,set-slot-reset</emphasis> This call is used to reset a PE. The PE configuration address ( PHB_Unit_ID_Hi, PHB_Unit_ID_Low, and config_addr) for the PE is obtained as defined in . All PEs have the capability of being reset independently. Resets outside or within the PE are not architected, but may be allowed by the platform implementation, providing that it does not violate other requirements of this architecture. The platform may use one of two methods to reset a PCI Express PE, when the ibm,set-slot-reset RTAS call is made with the Function 1/ Function 0 (activate the reset/deactivate the reset). If the PE is a single function of a multi-function IOA, then the Function Level Reset (FLR) option is required to be implemented by the function, and the platform uses FLR to reset the function. When the platform uses FLR instead of Hot Reset to reset a PCI Express PE, the platform provides the “ibm,pe-reset-is-flr” property in the function’s OF Device Tree node, and provides the same EEH recovery model to the software, as in the Hot Reset case, and as defined by . The property is provided in the case where there may be slightly different device-specific reset recoveries by the software for the FLR case. Software Implementation Note: The platform does not monitor writes to the FLR bit of an IOA, and so OS or Device Driver writes directly to the FLR bit on an IOA will not affect the PE State as shown in . Otherwise, a PCI Express Hot Reset is used. R1--1. For the EEH option: The ibm,set-slot-reset call must be implemented using the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,set-slot-reset</emphasis> Parameter Type Name Values In Token Token for ibm,set-slot-reset Number Inputs 4 Number Outputs 1 Config_addr PE configuration address (Register fields set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Function 0: Deactivate the reset to the PE 1: Activate the reset to the PE (for PCI Express, if the platform uses FLR to reset the PE, the platform provides the “ibm,pe-reset-is-flr” property in the function’s OF Device Tree node) 3: (optional) Activate the reset to the PE, using a PCI Express Fundamental Reset Out Status 0: Success -1: Hardware Error -3: Parameter Error -7: Unexpected state change -8: Fundamental Reset not defined for this configuration address

R1--2. For the EEH option: After activation of the reset ( Function 1 or 3), software must delay the deactivation of the reset ( Function 0) to that PE via the ibm,set-slot-reset call, until the minimum reset signal active time has elapsed, as designated by the bus specifications for the particular type bus or buses involved (100 millisecond for PCI). Software Implementation Notes: The device driver is responsible for any additional clean up required beyond that provided by a reset to the IOA. For PCI Express, the clean up may be slightly different based on whether the platform used FLR or Hot Reset to reset the PE. When FLR is used, the platform provides the “ibm,pe-reset-is-flr” property in the function’s OF Device Tree node. The software is responsible for quiescing (stopping) any MMIO Load and Store activities to the PE prior to resetting the PE. If the platform uses FLR to implement the PE reset, software may need to understand that this is a pulse and not a solid level, such that the adapter is not held at reset during the time from the call with Function 1 and the call with Function 0. R1--3. For the EEH option: After deactivation of the reset ( Function 0), software must delay access to that PE until the minimum time after reset that is required for the PE to be come stable has elapsed, as designated by the bus specifications for the particular type bus or buses involved (for example, 1.5 seconds for PCI Express). Software Implementation Notes: Different implementations of PCI Express may require different amounts of delay in order to traverse the I/O fabric since individual component delays are plug-in card specific. The ibm,read-slot-reset-state2 RTAS call returns a PE Reset State of 5 (PE is unavailable) while any reset delay time is happening for hardware outside the PE. R1--4. For the EEH option: If the ibm,set-slot-reset call is called with a Function of 0 (deactivate) and any reset to the reset domain specified by the PE configuration address is active, then the RTAS call must de-activate all resets to that PE configuration address. R1--5. For the EEH option: If the ibm,set-slot-reset call is called with a Function of 0 (deactivate) and there is no operation to be performed (for example, the reset to the reset domain specified by the PE configuration address is not active), then the RTAS call must return a Status of 0 (success). R1--6. For the EEH option: When the ibm,set-slot-reset call is called with a Function of 1 or 3 (activate) with a valid PHB Unit ID and config_addr and it is the case that FLR is not being used by the platform to reset the PE, then the RTAS call must activate the reset to the reset domain as designated by the PE configuration address, if not already activated. R1--7. For the EEH option: When the ibm,set-slot-reset RTAS call implements Function 3, the platform must also provide the “ibm,reset-capabilities” property in the RTAS node of the OF device tree. R1--8. For the EEH option: When the ibm,set-slot-reset call is called with a Function of 0 (deactivate) with a valid PHB Unit ID and config_addr and if the corresponding PE is in the MMIO Stopped or DMA Stopped state, then the RTAS call must bring that PE as designated by the PE configuration address out of the MMIO Stopped and DMA Stopped states and clear any applicable platform EEH status state. R1--9. For the EEH option: When the platform uses FLR to reset a PCI Express PE ( ibm,set-slot-reset call with a Function of 1(activate) followed by a call with Function 0 (deactivate)), then the platform must provide the “ibm,pe-reset-is-flr” property in the function’s OF Device Tree node, and the platform must always use FLR to reset a PE which contains this property in the OF Device Tree. R1--10. For the EEH option: For a PCI Express PE, the platform must provide the EEH recovery model to the software, as defined by , regardless of whether Hot Reset or FLR is used to reset the PE.

<emphasis>ibm,read-slot-reset-state2</emphasis> This call queries the state of a PE, and dynamically determines whether a PCI configuration address corresponds to a PE primary bus (that is, if it is the PE configuration address). In addition, when the PE Reset State parameter is a 5 (PE is unavailable), then the PE Unavailable Info indicates an approximate amount of time for which the PE might be unavailable. The PE configuration address ( PHB_Unit_ID_Hi, PHB_Unit_ID_Low, and config_addr) for the PE is obtained as defined in . When the ibm,get-config-addr-info2 RTAS call is implemented, that call can be used instead of this one to determine the PE configuration address. See and . R1--1. ibm,read-slot-reset-state2 call must be implemented using the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,read-slot-reset-state2</emphasis> Parameter Type Name Values In Token Token for ibm,read-slot-reset-state2 (see Firmware Implementation note, below) Number Inputs 3 Number Outputs 4: Always allowed. 5: May be allowed, depending on the value of the “ibm,read-slot-reset-state-functions” property in the RTAS node of the device tree ( ). Config_addr Configuration Space Address (Register fields set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Out Status 0: Success -3: Parameter Error PE Reset State Except for a PE Reset State of 5, this output is not valid unless the Config_addr Capabilities output is a 1 and the Status is a 0. 0: Reset deactivated and the PE is not in the MMIO Stopped or DMA Stopped state 1: Reset activated and the PE is not in the MMIO Stopped or DMA Stopped states 2: The PE is in the MMIO Stopped and DMA Stopped states with the reset deactivated and the Load/ Store path is disabled 4: The PE is in the DMA Stopped state with the reset deactivated and the Load/ Store path is enabled 5: PE is unavailable Config_addr Capabilities This output is not valid if the PE Reset State is a 5. 0: EEH not supported for the Config_addr 1: EEH supported for the Config_addr PE Unavailable Info This output is not valid unless the Config_addr Capabilities output is a 1 and the PE Reset State is a 5 and the Status is a 0, in which case the value of this parameter is a 0 if the PE is permanently unavailable, and non-zero if a recovery is in progress and there is an expected availability after the recovery; the non-0 value in this case is the number of milliseconds that the recovery is expected to take. PE Recovery Info This output is only valid if the “ibm,read-slot-reset-state-functions” property in the RTAS node of the device tree indicates that it is implemented and the call is made with a Number Outputs of at least 5 and the PE Reset State is a value of 2. This is a 32-bit field with bit significance, as follows: Bits 0:27 - Reserved Bits 28:29 - PE platform reset type. Only valid when bit 31 of this field is a value of 1. Bits 28:29 = 0b00: Firmware does not implement these bits. Reset type is most likely a Hot Reset. Bits 28:29 = 0b01: Platform used a Hot Reset to reset the PE. Bits 28:29 = 0b10: Platform used a Fundamental Reset to reset the PE. Bits 28:29 = 0b11: Platform used an FLR to reset the PE. Bit 30 - Retry Count Hint Bit 30 = 0: Either the PE associated with the Config_addr was the source of this PE entering the PE Reset State of 2, or the platform has not determined whether this PE was the source or not. Bit 30 = 1: The platform has determined that the PE associated with the Config_addr was not the source of this PE entering the PE Reset State of 2. That is, setting this bit indicates that this PE entered the PE Reset State of 2 as a side-effect of some error outside of this PE’s domain. Software may use this hint to not count this occurrence of the PE Reset State of 2 as part of any EEH error recovery retry count that it might be keeping for this PE. Bit 31 - Reset Status Bit 31 = 0: PE was not reset as a result of the platform transition to PE Reset State of 2. Bit 31 = 1: PE was reset as a result of the platform transition to PE Reset State of 2. If the PE is not below a node marked with the special value of the “status” property of “reserved”, then after deactivation of the platform-initiated PE reset, the platform is required to delay access to that PE until the minimum time after reset that is required for the PE to be come stable has elapsed, as designated by the bus specifications for the particular type bus or buses involved (for example, 1.5 seconds for PCI Express), by returning PE Reset State of 5 with PE Unavailability Info non-zero (temporarily unavailable) until that time has elapsed). If the PE is below a node marked with the special value of the “status” property of “reserved”, then after deactivation of the platform-initiated PE reset, the firmware immediately (without delay) transitions the PE to the PE Reset State of 2, and it is the controlling software that is required to do the bus-specific delays.

Firmware Implementation Note: The argument call buffer structure and requirements for this call are the same as for the old (removed from this architecture) ibm,read-slot-reset-state call, except for the last output parameter. Therefore, it is possible for platforms that still require the old ibm,read-slot-reset-state RTAS call to implement the ibm,read-slot-reset-state and ibm,read-slot-reset-state2 calls with the same RTAS token and use the number of output parameters to determine whether or not to implement the PE Unavailable Info parameter. Platform Implementation Notes: The ibm,read-slot-reset-state2 RTAS call only returns a PE Reset State of 1 (Reset activated and the PE is not in the MMIO Stopped or DMA Stopped state) when the reset may be removed by software; that is, if the error is potentially recoverable. If the firmware has detected a hardware error that is such that the reset to the device cannot be removed or is not safe to remove, the ibm,read-slot-reset-state2 does not return a PE Reset State of 1, but instead returns a PE Reset State of 5 (PE is unavailable) along with PE Unavailable Info of 0 (PE is permanently unavailable). The ibm,read-slot-reset-state2 RTAS call should never return a -1 (hardware error), but should instead return a PE Reset State of 5 (PE is unavailable) with a PE Unavailable Info of 0 (PE is permanently unavailable). R1--2. The ibm,read-slot-reset-state2 RTAS call must return a Reset State value of 5 (PE is unavailable) under any of the following conditions: Firmware has determined that communications with the PE is not available or the path to the PE cannot be traversed at the current time The ibm,slot-error-detail RTAS call has been called with a Function of 2, and none of the resetting conditions specified in Requirement have been met. Software Implementation Notes: The condition under Requirement may be temporary, with a recovery time in the range of seconds (for example, as little as 3 seconds or up to couple of minutes). Software may chose to delay the time indicated in the PE Unavailable Info and issue the ibm,read-slot-reset-state2 call again when a temporary condition exists. The condition may also be clearable with a power cycle of the PE, in which case the firmware may return a Status of 990x to the set-power-level RTAS call, to delay long enough to clear the temporary condition. Config_addr Capabilities may be indeterminate when the PE Reset State of 5 (PE is unavailable) is returned. Software should ignore the Config_addr Capabilities return when the PE Reset State of 5 is returned. R1--3. If the ibm,read-slot-reset-state2 RTAS call must return a PE Reset State value of 5 (PE is unavailable) then it must indicate in the PE Unavailable Info parameter one of the following: A value of zero, if there is no error recovery in progress that makes the PE available in any predictable amount of time (that is, the PE is “permanently” unavailable; for example, until a power cycle or until a repair action). A non-zero value, indicating the approximate time in milliseconds after which the path to the PE is expected to become available again. R1--4. The ibm,read-slot-reset-state2 RTAS call must return a Config_addr Capabilities of 1 (EEH supported for the Config_addr) for every Config_addr within a PE and for the PE configuration address. R1--5. The ibm,read-slot-reset-state2 RTAS call must comply with the state transitions defined in .

<emphasis>ibm,get-config-addr-info2</emphasis> This call is used obtain information about fabric configuration addresses, given the PCI configuration address. See for more information on PEs and determining PE configuration addresses. The PCI configuration address ( PHB_Unit_ID_Hi, PHB_Unit_ID_Low, and config_addr) for the call is defined by . R1--1. The ibm,get-config-addr-info2 call must be implemented using the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,get-config-addr-info2</emphasis> Parameter Type Name Values In Token Token for ibm,get-config-addr-info2 (see Firmware Implementation note, below) Number Inputs 4 Number Outputs 2 Config_addr Configuration Space Address (Register fields set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Function See for available functions. Out Status 0: Success -3: Parameter Error Info See for values.

R1--2. The ibm,get-config-addr-info2 RTAS call must return the Data output as per . Input and <emphasis>ibm,get-config-addr-info2 Function</emphasis> <emphasis>Info</emphasis> Output Function Input Definition Info Output 0 Get PE configuration address PE configuration address (as defined by ). Result returned in Info output is: Equal to the Config_addr input if there is no bridge or switch between the IOA function (endpoint) and the PE primary bus. Equal to the Config_addr of the PE primary bus if there is a bridge or switch between the IOA function and the PE primary bus. Undefined if Config_addr is not in a PE (query for PE state by using Function 1 first or by ibm,read-slot-reset-state2 RTAS call). A Status of -3 (Parameter Error) is returned in this case, also. 1 Query shared PE state 0: Config_addr is not in a PE (EEH not supported for the Config_addr). 1: Not shared (Only one IOA function in the PE). 2: Shared (More than one IOA function in the PE).

<emphasis>ibm,slot-error-detail</emphasis> This call combines device driver information, as gathered by the device driver prior to this call, with information derived by firmware from the platforms I/O infrastructure to create a detailed event log concerning a recoverable EEH event. In this way, both OS and platform maintenance applications have access to all the information about a given event. In addition, the OS can mark a PE configuration address as being in an unavailable state due to excessive errors. The caller supplies the device driver information, referenced by the Device_Driver_Error_Buffer argument. The Returned_Error_Buffer argument points to a buffer that this call fills with valid error log data as defined in the error log format section. Different platforms log using different versions of the error logging format. The error log data may include platform specific data as well as device driver data passed in the Device_Driver_Error_Buffer. Regardless of the error log version used, the data in the Returned_Error_Buffer is in an extended log format as defined in . When the call returns data for version 6 or greater, the device driver error buffer data is included as the last User Data section. The device driver data in the return buffer may be truncated from what is passed by the device driver or completely eliminated as necessary to ensure that the returned buffer length is not exceeded. The Config_addr supplied is the PE configuration address ( PHB_Unit_ID_Hi, PHB_Unit_ID_Low, and config_addr) for the PE, obtained as defined in , or a configuration address within the PE. The I/O fabric information that is captured by the platform consists of useful PCI configuration state at and above the supplied Config_addr. This RTAS call supports both plug-in PCI cards and built-in PCI IOAs. In this section, the term unavailable, when applied to a PE, means that ibm,read-slot-reset-state2 would return a PE Reset State of 5 (PE is unavailable) at the current time. R1--1. For the EEH option: The argument call buffer for the ibm,slot-error-detail call must correspond to the definition given in . Argument Call Buffer <emphasis>ibm,slot-error-detail</emphasis> Parameter Type Name Values In Token Token for ibm,slot-error-detail Number Inputs 8 Number Outputs 1 Config_addr Configuration address (Register numbers set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Device_Driver_Error_Buffer Real address of an error log buffer containing device driver debug data. This data is integrated into the final error log Device_Driver_Error_Buffer_Length Length of the Device_Driver_Error_Buffer Returned_Error_Buffer Real address of an error log buffer to contain a compliant error log entry composed by the RTAS Returned_Error_Buffer_Length Length of the Returned_Error_Buffer Function 1: Temporary Error 2: Permanent Error Out Status 1: No Error Log Returned 0: Success -1: Hardware Error (cannot create log)

R1--2. The Returned_Error_Buffer format must be the same as implemented by event-scan on the platform. R1--3. To prevent standard error log record truncation, the Returned_Error_Buffer_Length must equal the value of the OF device tree property “rtas-error-log-max”. R1--4. If the PE corresponding to the Config_addr is in the MMIO Stopped or DMA Stopped state, then the ibm,slot-error-detail RTAS call must return a Status of 0 and an error log that defines the FRU or FRUs to which the error is isolated. R1--5. If the communications with the Config_addr is not available, the path to the Config_addr cannot be traversed at the current time, or this call has previously made with a Function of 2 and none of the conditions that reset this state have been met (that is the PE is unavailable), then the ibm,slot-error-detail RTAS call must return a Status of 0 and an error log that defines the FRU or FRUs to which the error is isolated. R1--6. If the conditions in Requirements and are not met, then the ibm,slot-error-detail RTAS call must return a Status of 1, no error found, with no error log entry returned. Software and Platform Implementation Note: In some cases, the platform may return an information-only error log to meet Requirements and . For example, in some implementations this might be appropriate if the actual error was already logged via another RTAS call or this call was previously made with a Function of 2 and none of the conditions that reset this state have been met. R1--7. Once a PE is unavailable and in the absence of any state-resetting action by the OS that clears the corresponding PE configuration address EEH error (for example, reset or power cycle), the platform must return an error log in response to the ibm,slot-error-detail RTAS call. R1--8. Once a PE has experienced a state-resetting action by the OS that clears the corresponding PE configuration address EEH error (for example, reset or power cycle), that makes the PE available, the platform must return a Status of 1, no error found, with no error log entry in response to the ibm,slot-error-detail RTAS call. R1--9. If the ibm,slot-error-detail RTAS call Device_Driver_Error_Buffer_Length argument is non-zero, indicating the existence of optional device driver error data, the referenced buffer must contain an extended event log as defined in . R1--10. (Requirement Number Reserved For Compatibility) R1--11. When the ibm,slot-error-detail RTAS call returns an extended log debug record in the buffer specified by the Returned_Error_Buffer argument as mandated by Requirements and it must truncate the record at the length specified by the Returned_Error_Buffer_Length argument. R1--12. If a Function of 2 is passed to the ibm,slot-error-detail RTAS call, RTAS must unconditionally set the state of the PE corresponding to the Config_addr to permanently unavailable; that is, any subsequent calls to ibm,read-slot-reset-state2 return a PE Reset State of 5 (PE is unavailable) with the PE Unavailable Info argument set to zero. R1--13. RTAS must not change a PE Reset state of permanently unavailable unless one of the following occur: A PCI Hot Plug condition for the slot is encountered (as determined by the power being turned off and then on for the slot) The power domain is power cycled for another reason (for example, a power down of the OS image that owns the IOA) The state is cleared by a partition reboot or a dynamic LPAR reassignment of the PCI configuration address. R1--14. After a PE enters the MMIO and DMA Stopped States due to an error, the platform must keep cached error information relative to that error, for reporting via the ibm,slot-error-detail RTAS call, until any one of the following events occurs: The ibm,slot-error-detail RTAS call is called and the error information is returned. The reset to the PE is activated via the ibm,set-slot-reset RTAS call. The removal of the PE from the DMA Stopped State via Function 3 of the ibm,set-eeh-option RTAS call. The start of a DR operation as signalled by the calling of set-indicator with isolation-state set to isolate. R1--15. Prior to calling the ibm,slot-error-detail RTAS call, the PE which includes the Config_addr must not be in the MMIO Stopped State, if the maximum amount of useful information is to be captured, as defined by Requirement . R1--16. The firmware implementing the ibm,slot-error-detail is responsible for gathering the PCI fabric configuration space registers, including those at the specified Config_addr, and also any other non-PCI I/O fabric registers that might be useful for debug purposes (for example, internal PHB registers), with the suggested appropriate minimum set of PCI configuration registers captured for each PCI device being as indicated in . Suggested Minimum PCI Configuration Registers to Capture for <emphasis>ibm,slot-error-detail</emphasis> Data Structure Offset within the Data Structure Register Base PCI Configuration Space Header (for all PCI devices) 0x00 Vendor ID 0x02 Device ID 0x04 Command 0x06 Status 0x08 Revision ID 0x09 Class Code Type 0 Configuration Space Header (for non-PCI bridges only) 0x2C Subsystem Vendor ID 0x2E Subsystem ID Type 1 Configuration Space Header (for PCI bridges only) 0x1E Secondary Status PCI-X Capabilities List (for all PCI-X devices) 0x02 PCI-X Command (Type 0 Configuration Header) PCI-X Secondary Status (Type 1 Configuration Header) 0x04 PCI-X Status (Type 0 Configuration Header) PCI-X Bridge Status (Type 1 Configuration Header) PCI Express Capabilities Structure (for all PCI Express devices) 0x02 PCI Express Capabilities 0x04 Device Capabilities 0x08 Device Control 0x0A Device Status 0x0C Link Capabilities 0x10 Link Control 0x12 Link Status 0x14 Slot Capabilities 0x18 Slot Control 0x1A Slot Status 0x1C Root Control 0x1E Root Capabilities 0x20 Root Status Advanced Error Reporting Capability (for all devices implementing AER) 0x00 PCI Express Enhanced Capability Header 0x04 Uncorrectable Error Status 0x08 Uncorrectable Error Mask 0x0C Uncorrectable Error Severity 0x10 Correctable Error Status 0x14 Correctable Error Mask 0x18 Advanced Error Capabilities and Control 0x1C Header Log 0x2C Root Error Command (Root Ports only) 0x30 Root Error Status (Root Ports only) 0x34 Correctable Error Source Identification (Root Ports only) 0x36 Error Source Identification (Root Ports only)

R1--17. If the ibm,slot-error-detail RTAS call is made with the PE in the PE state of 2 (as defined by ), then the platform must not remove the PE from that state in order to probe the PCI fabric. R1--18. If the ibm,slot-error-detail RTAS call is made with the PE in the PE state of 4 (as defined by ), then the ibm,slot-error-detail RTAS call must return with the PE in the PE state of 4, except that if an error occurs in the course of probing the PCI fabric that requires a reset of the PE by the platform, then discontinue probing, return a Status of 0 or 1 (as appropriate), and return the PE in the PE state of 2. Software and Platform Implementation Notes: In Requirement , it is possible, as a part of the firmware probing the fabric, that the PE will transition temporarily to a PE state of 2, in the case where another EEH event occurs as part of the firmware probing the fabric. If the EEH event does not require a reset of the PE for these subsequent EEH events, then the firmware may transition the PE back to the PE state of 4, to continue probing. Several of these PE state 4->2->4 events may occur as a result of probing the fabric. In Requirement if an EEH event occurs as a result of probing that fabric that results in a reset of the PE, the returned PE state of 2 does not necessarily need to be checked for by the software on return from the call. The case where this occurs is expected to be rare, and probably signals a non-transient error. In this case the software can continue on with the recovery phase of the EEH processing, and will eventually hit the same event on further processing.

Bridged-I/O EEH Support Option The Bridged-I/O EEH Support Option provides RTAS calls for restoring the boot time configuration of EEH error domains that contain multiple IOAs or multi-function IOAs (for example, mult-function I/O cards which are constructed by placing multiple IOAs beneath a PCI to PCI bridge or PCI Express switch). During EEH recovery, the IOA is subject to a full hardware reset. These calls recreate any configuration changes, from full hardware reset, that the firmware normally makes during platform boot prior to turning the IOA over to the client program plus any subsequent changes made via ibm,change-msi. Once these calls restore the IOA initial configuration plus interrupts changes, it is the responsibility of the device driver, as part of its EEH recover procedure, to finish the configuration restoration with any non interrupts changes it makes to the IOA. Bridge types supported by these calls include PCI to PCI bridges (for example, a PCI to PCI bridge on an I/O plug-in card) and PCI Express bridges and switches. This option does not address the initialization of bridges and switches which are outside of all PEs. Those are the platform’s responsibility. If there is no supported bridge or switch at the PE configuration address specified by the input parameters, then these calls return a “success” without configuring anything, and therefore these calls can be made for all EEH recovery events, regardless of the type of I/O present. The PE configuration address ( PHB_Unit_ID_Hi, PHB_Unit_ID_Low, and config_addr) for the PE is obtained as defined in . Software Implementation Note: Neither ibm,configure-bridge nor ibm,configure-pe restores changes to an IOA’s post boot configuration registers except as made through the ibm,change-msi RTAS call (for example, to the point of being able to issue PCI memory space MMIO operations to the IOA, or perform DMA operations from the IOA). It is the software’s responsibility to restore any post boot non interrupt changes it made to the IOA’s PCI configuration space registers after calling one of these two RTAS calls.

<emphasis>ibm,configure-bridge</emphasis> R1--1. For the Bridged-I/O EEH Support option: The ibm,configure-bridge call must implement the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,configure-bridge</emphasis> Parameter Type Name Values In Token Token for ibm,configure-bridge Number Inputs 3 Number Outputs 1 Config_addr PE configuration address (Register fields set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Out Status 990x: Extended delay where x is a number 0-5 (see Software Implementation note below) 0: Success -3: Parameter Error

<emphasis>ibm,configure-pe</emphasis> This call has about the same semantics as the ibm,configure-bridge RTAS call, except that it: Has the additional semantics of bypassing the configuration process if the PE has previously not been reset by the platform as a result of entering the EEH Stopped State. Configures all the configurations spaces within the PE, including those of the endpoint devices within the PE (see Requirement ). Thus, this RTAS call can be made at the beginning of any EEH processing. R1--1. For the Bridged-I/O EEH Support option: The ibm,configure-pe call must implement the argument call buffer defined by . Argument Call Buffer <emphasis>ibm,configure-pe</emphasis> Parameter Type Name Values In Token Token for ibm,configure-pe Number Inputs 3 Number Outputs 1 Config_addr PE configuration address (Register fields set to 0) PHB_Unit_ID_Hi Represents the most-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr PHB_Unit_ID_Low Represents the least-significant 32-bits of the Unit ID of the PHB that corresponds to the config_addr Out Status 990x: Extended delay where x is a number 0-5 (see Software Implementation note below) 0: Success -3: Parameter Error

Software Implementation Note: When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling ibm,configure-pe again. However, software may issue the ibm,configure-pe call again either earlier or later than this. Firmware Implementation Note: This call needs to limit the long busy to 9900-9902 with at most a total of 1/5 second before the ibm,configure-pe succeeds. Any longer delays may cause subsequent hardware or application failures. For hardware errors, return a Status of 0 (Success). Hardware errors are subsequently discovered by further accesses to the PE and additional EEH events. R1--2. The caller of ibm,configure-pe must provide the PE configuration address, otherwise the RTAS call returns a -3, “Parameter Error”. R1--3. If the specified PE has been configured after the last platform or OS initiated reset to the specified PE with ibm,configure-connector, ibm,configure-bridge, or ibm,configure-pe, then the call must return with a Status of 0 (Success) without doing any bridge or switch configuration, otherwise the call must set up the configuration spaces of all the PCI to PCI bridges, PCI Express bridges, PCI Express switches, and endpoint functions within the PE, the way they were delivered at boot time except with all sticky error bits left intact, any changes made by calls to ibm,change-msi retained, and must do so with a single sequence of calls to ibm,configure-pe. Software Implementation Notes: The configuration of endpoint functions (the “and endpoint functions” part) in Requirement was added to the architecture after the firmware without that functionality in the ibm,configure-pe RTAS call was shipping. Therefore, any device driver that might run legacy implementations needs to be prepared to restore all endpoint function config spaces, since the ibm,configure-pe RTAS call might not. The ibm,configure-pe RTAS call does not restore non-interrupts configuration space changes that were made after boot (that is, under direction of the device driver or OS). Therefore, use of the ibm,configure-pe RTAS call does not absolve the device driver or OS from the restoration of non-interrupts the PCI configuration space for changes that were made to the configuration space after boot (see Requirement ). R1--4. The ibm,configure-pe call must only return a Status of 990x if one of the following conditions is true: The operation was not started. Firmware is able to restart the same call for this PE even when other intervening calls to ibm,configure-pe have occurred (That is, OSs are not required to serialize calls to ibm,configure-pe). R1--5. Software must complete all MMIO operations to the IOAs within a PE prior to calling the ibm,configure-pe RTAS call for a PE and must not issue new MMIO operations to the IOAs within the specified PE until after the RTAS call is complete. R1--6. On return from the ibm,configure-pe RTAS call, the platform must have the PE in the same EEH state (as defined by ) as when the call was made, except that if an error occurs in the course of probing the PCI fabric that requires a reset of the PE by the platform, then discontinue probing, return a Status of 0 or 1 (as appropriate), and return the PE in the PE state of 2. Software and Platform Implementation Note: Given Requirements and , it is permissible for the platform to temporarily transition the PE from a PE state of 2 to PE state of 4, if the call is made with a PE state of 2 but the hardware requires a PE state of 4 to get access to the PCI fabric. It is also permissible for the platform to go through several of these state changes during the execution of the call if there are errors that occur during the course of probing the PCI fabric that put the PE back into the PE state of 2. In Requirement if an EEH event occurs as a result of probing that fabric that results in a reset of the PE, the returned PE state of 2 does not necessarily need to be checked for by the software on return from the call. The case where this occurs is expected to be rare, and probably signals a non-transient error. In this case the software can continue on with the recovery phase of the EEH processing, and will eventually hit the same event on further processing.

Error Injection Option The Error Injection option (ERRINJCT) allows testing software to check out the OS’s error paths. This architecture defines the following abstract error categories: Fatal: Platform Architectural state has been corrupted to an unknown extent. Further valid processing is not possible. Recovered Random Event: The Central Electronics Complex (CEC) experienced an anomaly. However, platform architectural state has been preserved/restored. The OS should log the event and continue processing. Recovered Special Event: The CEC has experienced a statistically significant anomaly. While platform architectural state has been preserved/restored, the OS should log the event and discontinue the use of this processor as soon as possible to avoid a fatal situation. Corrupted Page: The System Memory page (Up to 4 KB) contains uncorrectable errors. The OS should log the event and avoid accessing this page in the future. The OS recovery is possible given that it can either recover the page from backing storage or isolate the error from unaffected processes. Corrupted SLB: The processor’s Segment Look-aside Buffer is corrupted. The OS should log the event and can recover if it can repopulate the SLB from internal tables. Translator Failure: The processor’s virtual to real translation hardware has failed. The processor’s architectural state has been preserved in System Memory. The OS may be able to continue the failed processor’s program and log the event on an alternate processor in the future. IOA Bus Error An error has occurred on the I/O bus on which an I/O Adapter (IOA) is attached. IOA or Device driver recovery from the error is possible if the error is such that it is reported to the IOA. Device driver recovery of the IOA’s operations is possible when the error is not reported to the IOA, if the EEH option is implemented and enabled. The ERRINJCT option RTAS call performs a platform dependent accurate simulation of the abstract error requested. In some cases, the platform hardware actually injects an error into the hardware. In others cases, the platform may simply report the anomaly without generating an error. Additionally, the ERRINJCT option provides access to platform specific error injection logic for the benefit of platform aware test software. R1--1. For the ERRINJCT option: RTAS must implement the ibm,open-errinjct call using the argument buffer defined by . Argument Call Buffer <emphasis>ibm,open-errinjct</emphasis> Parameter Type Name Values In Token Token for ibm,open-errinjct Number Inputs 0 Number Outputs 2 Out Open Token If Status is 0, then use this Open Token for corresponding ibm,errinjct and ibm,close-errinjct calls Status 990x: Extended delay, where x is a number 0-5 (see text) 0: Success -1: Hardware Error -2: Busy, try again later -4: Already open -5: PCI Error Injection is not enabled (not available)

When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling with the same parameters. However, software may issue the call again either earlier or later than this. Architecture Note: The output buffer is intentionally reversed from what it should be, according to Requirement (that is, Status not first output), due to code that was implemented and shipped as defined, above. R1--2. For the ERRINJCT option: On successful completion of the ibm,open-errinjct call, Firmware must return an Open Token which uniquely identifies the caller on following ibm,close-errinjct and ibm,errinjct calls (Firmware may also need to keep around other information about the caller that uniquely identifies the caller when correlated with the Open Token) and must allocate the ERRINJCT facilities to this caller until this same user calls ibm,close-errinjct. R1--3. For the ERRINJCT option: If the ERRINJCT facility has been previously opened, a call to ibm,open-errinjct call, must return a -4. R1--4. For the ERRINJCT option: RTAS must implement the ibm,close-errinjct call using the argument buffer defined by . Argument Call Buffer <emphasis>ibm,close-errinjct</emphasis> Parameter Type Name Values In Token Token for ibm,close-errinjct Number Inputs 1 Number Outputs 1 Open Token Open Token that was returned on the corresponding ibm,open-errinjct calls Out Status 990x: Extended delay where x is a number 0-5 (see text) 0: Success -1: Hardware Error -2: Busy, try again later -4: Close Error (User is not the one that opened the ERRINJCT facility or facility not open)

When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling with the same parameters. However, software may issue the call again either earlier or later than this. R1--7. For the ERRINJCT option: If the ERRINJCT facility is not open or was not previously allocated to the user via an ibm,open-errinjct call (that is, the Open Token along with any other pertinent data does not correspond with the user that opened the facility via the ibm,open-errinjct call), then a call to ibm,errinjct call, must return a -4 and the facility must remain open for use by the user that originally opened the facility. R1--8. For the ERRINJCT option: The platform must include the “ibm,errinjct-tokens” property as defined below in the /rtas node (see ) of the OF device tree with a specification for each implemented error injection class. R1--9. For the ERRINJCT option: The errinjct-token-names must be taken from the list provided in . Errinjct-token-names Errinjct-token-name Errinjct function fatal Simulate a platform fatal error. recovered-random-event Simulate a recovered random event recovered-special-event Simulate a recovered special (statistically significant) event corrupted-page Corrupt the specified location (and potentially surrounding locations up to the containing page) corrupted-slb Corrupt the SLB entry associated with a specific effective address. translator-failure Simulate a translator failure. ioa-bus-error Simulate an error on an IOA bus - 32 bit address specification only. ioa-bus-error-64 Simulate an error on an IOA bus - 64 bit address specification. platform-specific Request the firmware perform a platform specific error injection. corrupted-dcache-start Start causing a L1 data cache error corrupted-dcache-end Stop causing a L1 data cache error corrupted-icache-start Start causing an instruction cache error corrupted-icache-end Stop causing an instruction cache error corrupted-tlb-start Start corrupting TLB corrupted-tlb-end Stop corrupting TLB upstream-IO-error Inject I/O error above the IOA

R1--10. For the ERRINJCT option: For the errinjct-tokens implemented RTAS must use the work buffer format specified in . Errinjct Work Buffer Formats Errinjct-token-name Errinjct work buffer format fatal Undefined recovered-random-event Undefined recovered-special-event “1” for a non-persistent cpu recoverable error “2” for a persistent CPU recoverable error corrupted-page The first cell contains the upper 32 bits of the real address to corrupt. The second cell contains the lower 32 bits of the real address to corrupt. corrupted-slb The first cell contains the effective address associated with the SLB entry to corrupt translator-failure Undefined ioa-bus-error The first word contains I/O bus address, word aligned, at which to inject the error. The second word is a mask used to mask off up to 24 of the least significant bits of the address which are not to be used in the comparison of address for error injection (a 0 in a bit position masks off the bit, a 1 in the bit position enables the bit to be used in the compare). The third word is the config_addr on the bus which is to receive the injected error. The fourth word is the PHB_Unit_ID_Hi of the PHB that corresponds to the config_addr. The fifth word is the PHB_Unit_ID_Low of the PHB that corresponds to the config_addr. The sixth word defines the specifics of when and what to inject, as follows: See for values 0 through 19. 20: (Optional) Disable PCI error injection for the specified bus 21: Obtain current error inject values. When RTAS returns SUCCESS in the Status field the work buffer field values are populated with the current error injected. ioa-bus-error-64 The first and second words contain the I/O bus address, double word aligned, at which to inject the error. The third and fourth words are a mask used to mask off up to 64 of the least significant bits of the address which are not to be used in the comparison of address for error injection (a 0 in a bit position masks off the bit, a 1 in the bit position enables the bit to be used in the compare). The fifth word is the config_addr of an IOA on the bus which is to receive the injected error. The sixth word is the PHB_Unit_ID_Hi of the PHB that corresponds to the config_addr. The seventh word is the PHB_Unit_ID_Low of the PHB that corresponds to the config_addr. The eighth word defines the specifics of when and what to inject, as follows: See for values 0 through 19. 20: (Optional) Disable PCI error injection for the specified bus 21: Obtain current error inject values. When RTAS returns SUCCESS in the Status field the work buffer field values are populated with the current error injected. platform-specific See platform firmware documentation (RTAS component specifications) for working buffer format for any particular platform. corrupted-dcache-start corrupted-dcache-end The first cell defines the specific action to take: 0: Parity error 1: D-ERAT parity error 2: tag parity error The second cell defines the nature of the error: 0: single 1: solid 2: hang Supported injection modes are hardware specific and all modes may not be supported on all hardware. The first supported injection mode in the above list will be used if an unsupported mode is specified (that is, first single, then solid, then hang-pulse). If none of the above modes are available, then the injection option most similar to single in functionality will be used. corrupted-icache-start corrupted-icache-end The first cell defines the specific action to take: 0: parity error 1: I-ERAT parity error 2: cache directory 0 parity error 3: cache directory 1 parity error The second cell defines the nature of the error: 0: single 1: solid 2: hang corrupted-tlb-start corrupted-tlb-end The first cell defines the nature of the error: 0: single 1: solid 2: hang Supported injection modes are hardware specific and all modes may not be supported on all hardware. The first supported injection mode in the above list will be used if an unsupported mode is specified (that is, first single, then solid, then hang-pulse). If none of the above modes are available, then the injection option most similar to single in functionality will be used.

Programming Note: Options having a “-start” and corresponding “-end” must be called in pairs on the same processor. The corresponding “-end” option should be called after the injected error has been noticed and processed by the caller. On the same processor, other error inject options should not be called between a “-start” and “-end” sequence. However, it is possible to inject the same type of error multiple times by calling “-start” on that CPU as long as the “nature of error” is “single”. The buffer contents should be the same for a “-start” and corresponding “-end”. While not recommended -end can be replaced with a call to ibm,close-errinjct, but improper cleanup the machine may result, with the machine left in an unknown state. R1--11. For the ERRINJCT option: If the platform notifies the OS of a specific CEC error using the machine check interrupt in response to an ibm,errinjct RTAS call, the platform must do so only when the processor’s MSRRI bit is active, unless said error is fatal or involves accessing a storage location that has itself been corrupted or is accessed through a corrupted SLB entry. R1--12. For the ERRINJCT option with the LPAR option: Hypervisor RTAS must allow a partition to only corrupt its own memory pages. R1--13. For the ERRINJCT option with the LPAR option: Hypervisor RTAS must allow a partition to inject IOA bus errors only if all of the following are true: The IOA bus is not shared with other partitions. The EEH option is implemented and enabled for the bus on which the error injection is requested. R1--14. For the ERRINJCT option with the LPAR option: The platform must allow at most one partition to issue platform-specific errinjct calls. R1--15. For the SPLPAR option: The platform must either implement actual hardware error injection with these interfaces, or must fabricate appropriate partition behavior (machine check, error logs, etc.) as if the hardware error had happened. R1--16. For the Multi-threading Processor option: All threads on the processor on which the error is injected must be prepared to handle the error. R1--17. For the Error Injection option: The software using the ibm,errinjct call must be prepared to receive a -3 for non-implemented errinjct work buffer formats. R1--18. For the ioa-bus-error and ioa-bus-error-64 functions of the ERRINJCT option: For each ibm,errinjct RTAS call invocation, the platform must inject the error specified in the working buffer at most once. Semantics for <emphasis>ioa-bus-error</emphasis> <emphasis>ioa-bus-error</emphasis> Sixth Word and <emphasis>ioa-bus-error-64</emphasis> Eighth Word Values 0-19 Operation Address Space Cell Value Conventional PCIPCI-X Mode 1PCI-X Mode 2 PCI Express Load PCI Memory 0 Inject an Address Parity Error Inject a TLP ECRC Error For PHB implementations that do not allow injection of a TLP ECRC error into the request, or for the case where the injection would be in violation of Requirement due to the hardware configuration, the platform should emulate the error by setting the appropriate error state in the PHB when EEH is enabled. 1 Inject a Data Parity Error PCI I/O 2 Inject an Address Parity Error 3 Inject a Data Parity Error PCI Configuration 4 Inject an Address Parity Error 5 Inject a Data Parity Error Store PCI Memory 6 Inject an Address Parity Error Inject a TLP ECRC Error (optional) 7 Inject a Data Parity Error PCI I/O 8 Inject an Address Parity Error 9 Inject a Data Parity Error PCI Configuration 10 Inject an Address Parity Error 11 Inject a Data Parity Error DMA read PCI Memory 12 Inject an Address Parity Error Inject a TLP ECRC Error 13 Inject a Data Parity Error 14 Inject a Master Abort (no response to IOA) Error -- 15 Inject a Target Abort Inject a Completer Abort or Unsupported Request Inject the error that is injected on a TCE Page Fault. DMA write PCI Memory 16 Inject an Address Parity Error Inject a TLP ECRC Error 17 Inject a Data Parity Error 18 Inject a Master Abort (no response to IOA) Error -- 19 Inject a Target Abort Not Applicable

Platform Implementation Notes: Platforms that implement LPAR normally do not allow any partition to be configured to perform platform-specific errinjct calls since they are capable of crashing the entire complex. However, the should provide special hidden overrides for laboratory testing purposes. Software and Firmware Implementation Notes: When a call to ibm,errinjct results in an error injected into a processor, then the error is injected on the same processor as the one that called the ibm,errinjct RTAS call, not the processor that called the ibm,open-errinjct. The OS could call ibm,open-errinjct, ibm,errinjct, and ibm,close-errinjct from three different processors. For usability reasons, the ibm,close-errinjct RTAS call should do a reasonable amount of cleanup; turning off error injection where it can. However, since the ERRINJCT option is intended for internal use (that is, not intended to be productized) and since software is allowed to basically set unlimited error injections between the calls to ibm,open-errinjct and ibm,close-errinjct, the firmware may vary by implementation as to what is cleaned up and what is not. An example of something that might be very difficult to clean up is injection of memory errors. Something that might be easier is to turn off the error injection in all bridges to which the caller has access. Users of the ERRINJCT option should consult the implementation documentation for a particular platform to learn about the level of cleanup that is done in the ibm,close-errinjct call for that implementation. In the severe case, a reboot may even be necessary after the ibm,close-errinjct in order to clear the error. In other cases it may be possible for the caller to partially disable an error that it has set by setting a benign error (for example, in the PCI error injection case, by setting the error injection for a bus that was previously set to inject an error to an address that will never occur to that IOA). Test developers are encouraged not to extensively use the platform-specific option to this function. In general, platform-specific implementation options are not carried forward to new platforms.

Firmware Assisted Non-Maskable Interrupts Option (FWNMI) The FWNMI option provides firmware support for System Reset interrupts and platform dependent error recovery for recoverable machine checks. The firmware gets control on a non-maskable interrupt (NMI), analyses the condition, and, if the processor was not running inside the hypervisor, reports its findings to the OS. The OS registers system reset and machine check handlers by issuing either the ibm,nmi- register or ibm,nmi- register-2 RTAS call. In addition, with these calls the OS permanently relinquishes to firmware the Machine State Register’s Machine Check Enable bit, the two hundred fifty six (256) bytes of the System Reset Interrupt vector starting at real location 0x100, the two hundred fifty six (256) bytes of the Machine Check Interrupt vector starting at real location 0x200, as well as the storage page starting at real location 0x7000. The RTAS firmware records the entry points of the OS notification routines to call to report the results of the firmware’s analysis and any attempted recovery should the hardware signal a machine check or system reset interrupt. The results of an error analysis are reported via a standard error log structure as defined in . The storage containing the error log structure is subsequently released back to firmware use by the OS after it has completed its event handling by the issuance, from the interrupted processor, of the ibm,nmi-interlock RTAS call. Multiple processors of the same OS image may experience fatal events at, or about, the same time. The first processor to enter the machine check handling firmware reports the fatal error. Subsequent processors serialize waiting for the first processor to issue the ibm,nmi-interlock call. These subsequent processors report “fatal error previously reported”. If, after the firmware makes a Machine Check call back, and before the OS issues the ibm,nmi-interlock call, the same processor that is currently holding the storage containing the error log structure receives another Machine Check NMI, the firmware has no choice but to declare the condition fatal, log the result and execute the partition’s reboot policy. When the OS gets control after a machine check, at its registered machine check notification routine, all architected processor registers have been restored to the values they contained when the firmware was notified of the interrupt, except for register R3 which contains a real address that points to a 16 byte structure. The first 8 bytes of this area contains the original contents of R3 and the second 8 bytes contains the fixed portion of the standard error log structure. If firmware is able to immediately make a repair determination, the fixed portion indicates that an additional variable part is present and follows the fixed part per the standard error log structure. For some other errors, the determination of the repair action is delayed, and the firmware reports these determinations asynchronously to handling the machine check. The repair action log is queued in the NVRAM and is reported either in a subsequent event-scan if the OS image remains operational, or on a subsequent boot. In no case, does the OS call check-exception in its machine check notification routine. The difference between ibm,nmi- register and ibm,nmi- register-2 is that ibm,nmi- register allocates the error reporting structure in RTAS space while ibm,nmi- register-2 places the error reporting structure in real page 7. New OS designs should use ibm,nmi- register since support for ibm,nmi- register-2 will be terminated at some future date. As with all first level interrupt service routines, the SPRG-2 register is used to save the state of one general purpose register while the processor computes the location of its state save area. Implementation Note: An acceptable non-LPAR firmware implementation for the NMI check handlers saves one register in an SPRG-2. Then, using the processor number register, determines an offset into a page 7 table of addresses to the start of a per processor RTAS save area (only need a single register saved per processor), and acquires a lock located in page 7 to serialize the use of the RTAS state save area among potentially competing processors. The MSRME bit then prevents single processor Machine Check stacking in the interval between the Machine Check call back and the ibm,nmi-interlock call. LPAR implementations should minimize potential effects to innocent partitions due to Machine Check Interrupts affecting other partitions. If the NMI was taken inside the hypervisor, then, if the firmware determines that the condition is recoverable, the hypervisor recovery routine is invoked. If the condition is not recoverable, hypervisor clean up routines establish a safe state and mark the hypervisor return routine to invoke the proper OS registered NMI routine rather than doing the standard hypervisor return. R1--1. All platforms must implement the FWNMI option. R1--2. For the FWNMI option: The platform must include the “ibm,nmi-register” RTAS function property name in the OF /rtas node. R1--3. For the FWNMI option: The platform must include the “ibm,nmi-register-2” RTAS function property name in the OF /rtas node if the platform requires support from interim OS versions. R1--4. For the FWNMI option: RTAS must implement the ibm,nmi-register and/or ibm,nmi-register-2, calls as appropriate per Requirements and using the argument buffer defined by . Argument Call Buffer <emphasis>ibm,nmi-register or ibm,nmi-register-2</emphasis> Parameter Type Name Values In Token Token for ibm,nmi-register or nmi-register-2 Number of Inputs 2 Number of Outputs 1 System Reset Notification Routine Real/Logical address of OS routine to call on a System Reset (in the first 32 MB of memory). Machine Check a Notification Routine Real/Logical address of OS routine to call on a Machine Check (in the first 32 MB of memory). Out Status 0: Success -1: Hardware Error -3: Parameter Error

R1--5. For the FWNMI option: Once the OS has registered for NMI notification, it must not change the contents of the two hundred fifty six (256) bytes of the NMI interrupt vectors at real locations 0x100 or 0x200 or the memory page starting at real location 0x7000. R1--6. For the FWNMI option: The Real/Logical address of the registered OS Machine Check and System Reset routines must be in the first 32 MB of the OS’s memory address space. Software Implementation Note: Requirement ensures that the registered OS Machine Check and System Reset routines are within the code’s RMA. R1--7. For the FWNMI option: If the OS registered with ibm,nmi-register, firmware must not store the state of the processor at the time of interrupt in interrupt vectors at locations 0x100 or 0x200 or the memory page starting at real location 0x7000. Firmware may use RTAS space to store such state data. R1--8. For the FWNMI option: Once the OS has registered for NMI notification, the platform firmware must intercept all System Reset Interrupts on all of the OS’s processors. R1--9. For the FWNMI option: The platform firmware, for those intercepted System Reset interrupts which platform policy dictate are to be forwarded to the OS, must invoke the OS registered System Reset Interrupt notification point with translate off and all other architected processor registers restored to their state at the time of the System Reset. R1--10. For the FWNMI option: Once the OS has registered for NMI notification, the platform firmware must intercept all Machine Check Interrupts on all of the OS’s processors. R1--11. or the FWNMI option: The platform must provide a mechanism for the firmware to signal a non-maskable interrupt to each processor in a partition. R1--12. For the FWNMI option: The platform firmware must analyze all intercepted Machine Check Interrupts, determine if the OS may safely continue using the platform, attempt to recover any corrupted architectural state, and report the results of the recovery attempt to the OS. R1--13. For the FWNMI option: If the platform firmware, on analyzing an intercepted Machine Check Interrupt, determines that the OS may safely continue using the platform, it must invoke the OS registered Machine Check Interrupt notification point with translate off but all other architected processor registers restored to their state at the time of the Machine Check except that General Purpose Register (GPR) R3 contains the real address of a 16 byte memory buffer containing the original contents of GPR R3 in the first 8 bytes and the RTAS Error Log (fixed part) (per ) in the second 8 bytes. R1--14. For the FWNMI option: The maximum time for the platform’s processing of a non-fatal machine check interrupt must be on the order of that taken by the check-exception critical call. R1--15. For the FWNMI option: Once the firmware has reported a “fatal” machine check event to an OS image it must only report “fatal error previously reported” (see ) in response to machine checks on any processor belonging to that image. R1--16. For the FWNMI option: If the platform firmware, on analyzing an intercepted Machine Check Interrupt, determines that the OS may not safely continue using the processor (for example a check stop will certainly result), it must select one of the implementation options given in . Unsafe Processor Recovery Options Option Number Implementation Option for handling an unsafe processor. 1 Invoke the registered Machine Check Interrupt notification point on a spare processor which platform firmware substitutes for the offending processor. Note: Firmware must adjust all interrupt XIVT entries and APM registers, etc., so that the OS need not be aware of the processor substitution. The VPD of the new and old processors are different, the dynamic VPD collection RTAS call can be used to determine the new values. Since the results of this substitution are indicated as a non-fatal error to the OS, the substitution may take no more than 10 times the length of time of a critical check exception process. The firmware makes a best effort to load the decrementer with a value that represents the value in the failed processor at the time of the machine check minus a value that represents the time taken by the substitution process. 2 Mark the processor unsafe, do not return to the OS on that processor and notify the OS to at the next event scan time with a fatal return message. Note: This action may cause the OS to “hang” due to locks held by the failing processor etc. that may cause a surveillance time out. The NVRAM firmware error log retains a trail of this condition for reading and logging at the subsequent OS boot. However, in those cases where a hang does not happen, the OS can select some other processor to pick up the thread of execution.

R1--17. For the FWNMI option: RTAS must implement the ibm,nmi-interlock call using the Argument buffer defined in which causes the release of the machine check work and reporting area in page 7. Argument Call Buffer <emphasis>ibm,nmi-interlock</emphasis> Parameter Type Name Values In Token Token for ibm,nmi-interlock Number of Inputs 0 Number of Outputs 1 Out Status 0: Success -1: Hardware Error

R1--18. For the FWNMI option: The ibm,nmi-interlock RTAS call must not require serialization with respect to any other RTAS or hypervisor calls. R1--19. For the FWNMI option: The processor receiving the nmi signal must, after it has processed the buffer pointed to by its R3 register, call the ibm,nmi-interlock RTAS call.

Memory Statistics Depending upon the platform configuration, various portions of installed platform memory are in one of several states. Some memory may be mapped out of the address space due to an error in one or more locations. Other memory is used by the platform firmware. What is left is allocated to one or more logical partitions or held in reserve. The usage of memory is a first order platform management parameter, and is needed by platform managers. However, it may also become a covert channel between logical partitions. Therefore, the memory usage information that is surfaced to an OS image by firmware is restricted to total platform memory installed, plus three sub-divisions which total to the total memory installed. These three sub-divisions are the total memory the platform mapped out due to hardware failure, total memory reserved for platform firmware and other partitions, and the memory allocated to the calling OS image. LPAR machines can provide a more detailed memory usage report via their Hardware Management Console. The total memory allocated to the calling OS image is obtained through the device tree (potentially modified by post boot dynamic reconfiguration).

System Parameters Option The system parameters which are defined are shown in . Defined Parameters Parameter token Parameter Description Values Notes 0 HMC 0 1 HMC 1 2 thru 15 HMC 2 thru 15 16 Reserved 17 Reserved 18 Processor CoD Capacity Card Info 1 19 Memory CoD Capacity Card Info 1 20 SPLPAR Characteristics Opaque ASCII NULL terminated string 1, 3 21 partition_auto_restart 22 platform_auto_power_restart 23 sp-remote-pon Remote Power On (see ) One byte decimal 0 (for off) 1 (for on) Default 0 24 sp-rb4-pon Number of rings until power on (see ) One byte decimal 25 sp-snoop-str Snoop sequence string (see ) 26 sp-serial-snoop Serial snoop enable/disable (see ) 0 (for off) 1 (for on) Default 0 27 sp-sen Surveillance enable/disable (see ) 0 (for off) 1 (for 0n) Default 0 28 sp-sti Surveillance time interval in minutes (see ) 1-255 Default 5 29 sp-sdel Surveillance delay in minutes (see ) 1-120 Default 10 30 sp-call-home 31 sp-current-flash-image 0 (for perm) 1 (for temp) 32 platform-dump-max-size 64-bit integer The value consists of a 32-bit high value followed by a 32-bit low value. The resulting 64-bit value is unsigned. 33 epow3-quiesce-time 0-65535 seconds Default 0 34 memory-preservation-boot-time 0-65535 seconds Default 0 35 SCSI Initiator Identifier 36 AIX support 37 Enhanced Processor CoD Capacity information 1 38 Enhanced Memory CoD Capacity information 1 39 CoD options See . 1 40 Platform Error Classification See . 41 Firmware Boot Options See . 42 platform-processor-diagnostics-run-mode One byte decimal 0=disabled 1=staggered 2=immediate 3=periodic 43 Processor Module Information See . 1 44 Cooperative Memory Over-commitment Definitions Opaque ASCII NULL terminated string 1, 4 45 Cede Latency Settings Information 46 Target Active Memory Compression Factor Field length: 2 bytes Format: binary Range: 100 -- 1000 47 Performance boost modes vector See . From ibm,get-system-parameter the field length is 96 bytes consisting of 3 32 byte bit vectors. To ibm,set-system-parameter the field is a single 32 byte bit vector 48 Reserved 49 Reserved 50 TLB Block Invalidate Characteristics Variable Length Series of Bytes See 51 Reserved 52 Energy Management Tuning Parameters Series of 8 byte entries of bytes encoding the tuning parameters supported by the system See 1 53 Firmware Service Expiration Date This is the date a system's system firmware service warranty period expires. 8-character null-terminated ASCII string in YYYYMMDD format 1 54 Firmware Service Entitlement Activation Key This is the activation key used to set or extend a system's firmware service warranty period. 34-character null-terminated ASCII string key value 2 55 LPAR Name Logical Partition name Null-terminated ASCII string 1 >55 Reserved

Notes: These system parameters are defined for the ibm,get-system-parameter RTAS call only. An attempt to set them using the ibm,set-system-parameter RTAS call results in a return Status of -9002 (Setting not allowed/authorized). Used by ibm,set-system-parameter; not supported for ibm,get-system-parameter The format of the SPLPAR string is beyond the scope of this architecture. See also, . See . Further parameters will be defined as required. R1--1. All platforms must support the System Parameters option. R1--2. (Requirement Number Reserved For Compatibility) R1--3. For the System Parameters option: If the length of the data for a parameter in is less than what is specified in the requirements for a parameter or if the data value in an ibm,set-system-parameter RTAS call is other than what is allowed by the requirements for the parameter, the platform must return a -9999 indicating a parameter error. R1--4. For the System Parameters option: The default values defined for parameters sp-sen, sp-sti and sp-sdel in the must apply to the platform prior to any ibm,set-system-parameter RTAS call. R1--5. For the System Parameters option: The ibm,get-system-parameter RTAS call must implement the argument call buffer defined by . If the ibm,set-system-parameter RTAS call is implemented, it must use the argument call buffer defined by . R1--6. For the System Parameters option: If the platform implements the ibm,set-system-parameter RTAS call it must also implement the ibm,get-system-parameter RTAS call. R1--7. For the System Parameters option: A system parameter, which is not supported by the system, must return a Status of -3 (System parameter not supported) from the RTAS call. R1--8. For the System Parameters option: A system parameter for which access is not authorized, must return a Status of -9002 (Not authorized) from the RTAS call. R1--9. For the System Parameters option: When a platform implements a system parameter, it must meet the definition in including applicable descriptions and notes. R1--10. For the System Parameters option: An ibm,get-system-parameter RTAS call with a buffer length of zero (0) must return a Status of 0 (success) if the parameter is supported and authorized, a Status of -3 if not supported, or a Status of-9002 if not authorized. R1--11. For the System Parameters option: An ibm,set-system-parameter RTAS call with a parameter length of zero (0) must return a Status of 0 (success) if the parameter is supported and authorized, a Status of -3 if not supported, or a Status of -9002 if not authorized. Programming Note: A partition may lose or gain authority for an ibm,get-system-parameter or ibm,set-system-parameter call dynamically. For instance, three consecutive calls with the same parameters could return Status of success, not authorized, and success. R1--12. For the System Parameters option: The platform must enforce the length of system parameter strings as follows: input strings to ibm,set-system-parameters not to exceed 1024 bytes in length else the platform returns a Status of -9999 (parameter error) from the RTAS call; output strings from ibm,get-system-parameters not to exceed 4000 bytes. R1--13. For the System Parameter option with the SPLPAR option: The Platform must implement parameter token 20 as defined in for ibm,get-system-parameter. Implementation Note: Of course the OS is allowed to provide and specify a buffer that is larger than the maximum system parameter length.

<emphasis>ibm,get-system-parameter</emphasis> Argument Call Buffer <emphasis>ibm,get-system-parameter</emphasis> Parameter Type Name Values In Token Token for ibm,get-system-parameter Number Inputs 3 Number Outputs 1 Parameter Token of system parameter to retrieve buffer Real address of data buffer length length of data buffer Out Status 0: Success -1: Hardware Error -2: Busy, Try again later -3: System parameter not supported -9002: Not authorized -9999: Parameter Error 990x: Extended delay where x is a number 0-5 (see text below)

The ibm,get-system-parameter RTAS call fetches the data for the selected parameter and places it at the address specified in the buffer operand. The first two (2) bytes of the data in the buffer are the length of the returned data, not counting these first two (2) bytes. The length of string data includes the length of the NULL but excludes the length field. If the buffer length is less than the returned data length, the data is truncated at the end of the buffer. The maximum length of the input parameter data string for ibm,set-system-parameter is architecturally limited to 1024 bytes of data and 2 bytes of length, totaling 1026 bytes. The maximum length of the output parameter data string for ibm,get-system-parameter is architecturally limited to 4000 bytes of data and 2 bytes of length, totaling 4002 bytes. The only currently valid parameters are as specified in . When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling ibm,get-system-parameter with the same parameter index. However, software may issue the ibm,get-system-parameter call again either earlier or later than this.

<emphasis>ibm,set-system-parameter</emphasis> Argument Call Buffer <emphasis>ibm,set-system-parameter</emphasis> Parameter Type Name Values In Token Token for ibm,set-system-parameter Number Inputs 2 Number Outputs 1 Parameter Token number of the target system parameter buffer Real address of data buffer Out Status 0: Success -1: Hardware Error -2: Busy, Try again later -3: System parameter not supported -9002: Setting not allowed/authorized -9999: Parameter Error 990x:Extended delay where x is a number 0-5 (see text below)

The ibm,set-system-parameter RTAS call fetches the data from the address specified in the buffer operand and sets it into the system parameter specified by the Parameter operand. The first two (2) bytes of the data in the buffer are the length of the data, not counting these first two (2) bytes. The length of string data includes the length of the NULL but excludes the length field. The only currently valid parameters are as specified in . When the 990x Status is returned, it is suggested that software delay for 10 raised to the x milliseconds (where x is the last digit of the 990x return code), before calling ibm,set-system-parameter with the same parameter index. However, software may issue the ibm,set-system-parameter call again either earlier or later than this.

HMC Parameter The full HMC parameter data string is returned when the ibm,get-system-parameter RTAS call is issued. HMC parameter contents are: length - two byte binary length of data string associated with the HMC. System Parameter Data - string of semicolon delimited ASCII data. The ibm,set-system-parameter is not required to support the HMC parameter since the HMC parameter data is set only by the HMC through an out-of-band path. The ibm,get-system-parameter RTAS call is provided for reading the parameter data. The RTAS call is available in both LPAR mode and non-LPAR mode. R1--1. For the HMC Parameter: If the ibm,set-system-parameter RTAS call is provided, the use of the HMC parameters, 0-15, must always return not authorized Status, -9002. R1--2. For the HMC Parameter: The format of each HMC system parameter supported by this system must consist of a two byte binary length field describing the full length of the parameter data, followed by a series of variables where each variable is of the form “keyword” followed by “=” followed by “value” and terminated by “;”. The order of the variables is undefined and the total number of variables is undefined. The platform must provide the “HscIPAddr” keyword The acronym “HSC” was replaced by “HMC” but this keyword was retained with “Hsc” so as to not invalidate code already created using the keyword. , and the “RMCKey” keyword. The data after the equal sign (=) may or may not have content. The value of the “HscIPAddr” keyword is the IP address of that HMC. The value of the “RMCKey” keyword is either null or is the RMC key for that system or partition. The value of a keyword is null if there is nothing between the “=” and the “;”. The update state of keyword values of an HMC system parameter are uncertain when the HMC stops talking to the managed system. R1--3. For the HMC Parameter: All HMC system parameter data must be printable ASCII characters, excluding the two byte binary length field. R1--4. For the HMC Parameter: The lowest valued HMC system parameter which returns a -3 Status must have no higher valued HMC system parameter which is supported. That is, a scan of HMC system parameters from 0 until the first -3 Status must indicate all supported HMC system parameters. R1--5. For the HMC Parameter: If there is no HMC control of this platform, the platform must return a null response, zero length data, to requests for all supported HMC system parameters. Implementation Note: Since the system is not necessarily to be HMC controlled, it is shipped with the HMC parameter set to the zero (0) length. If the system is HMC controlled, the HMC passes the parameter values to the system at boot time so that the ibm,get-system-parameters RTAS call indicates HMC control. If there is deconfigured the HMC can write the zero (0) length data to the system. If that is not done, the system can write the parameter to zero length on a hard reset and the HMC, if present, then initializes the data. R1--6. For the HMC Parameter: The platform must truncate the HMC system parameter data at the buffer length if the buffer length is less than the data length plus 2.

Capacity on Demand (CoD) Option Platforms may optionally provide mechanisms for securely licensing a subset of the platform’s physically installed resources for use. The CoD option includes system parameters relating to the CoD Capacity “smart card” which is used to securely store and validate the license information. Dynamically adding memory and cpu resources to running partitions requires the CoD option combined with the Logical Resource Dynamic Reconfiguration option. Additionally platforms may provide a provisional CoD activation mode known as “Trial CoD”. This mode provides immediate availability of resources while the permanent license is on order. The CoD resources are made available for a platform dependent period of power on hours. If the platform implements the CoD sparing option and the platform predicts the failure of a CoD resource, given that there is spare capacity of that resource, the platform makes available a spare resource so that the OS can migrate work off the failing resource and return the failing resource to the OS. If the OS takes advantage of this sparing, by actually using the available resource the OS is using resources in excess of the permanently licensed entitlement until the failing CoD resource is returned to the platform. R1--1. For the CoD option: The platform must support the System Parameter option (ibm,get-system-parameter) along with Parameter tokens 18 and 19 as described in .

CoD Capacity Card Info Software Note: System parameters 18 and 19 present only permanently activated capacity. These parameters will be removed at some point in the future. OSs should begin using the enhanced CoD Capacity parameters. These two read only system parameters (one for memory and another for processors) are ASCII hexadecimal strings representing the current licensed entitlement of CoD resources of their respective types. These strings contains 9 packed fields as presented in . CoD Capacity Card Info String Packed Fields Field Number Definition 1 System type (4 ASCII characters) (4 bytes) 2 System serial number (8 ASCII characters: pp-sssss) (8 bytes) 3 CoD capacity card Custom Card Identification Number (CCIN) (4 ASCII characters) (4 bytes) 4 CoD capacity card serial number (10 ASCII characters: pp-sssssss) (10 bytes) 5 CoD capacity card unique ID (16 ASCII characters) (16 bytes) 6 CoD resource identifier (4 ASCII characters) (4 bytes) 7 Quantity of Activated CoD resource (4 characters ASCII) (4 bytes) 8 CoD sequence number (4 numeric ASCII characters) (4 bytes) 9 CoD activation code entry check (1 byte hex check sum, two ASCII characters - based on EBCDIC representation of items: 1, 2, 3, 4, 5, 6, 7, and 8) (2 bytes)

R1--1. For the CoD option: The platform’s ibm-get-system-parameter RTAS call, specifying the CoD Capacity Card Info, must, upon successful completion, return the ASCII representation of the information defined in for the managed CoD resource type specified by the system parameter token. R1--2. For the CoD option: The platform’s ibm,set-system-parameter RTAS call specifying the CoD Capacity Card Info, must not return a Status of 0 (success); the expected return is a Status of -9002 (Setting not allowed/authorized), however, under special cases a Status of -1 (Hardware error), or one of the Busy or Extended Delay return Status return values is allowed.

Predictive Failure Sparing with Free Resources A platform may optionally provide an unused resource to a partition that is notified of a predictive failure. This allows the partition’s OS to transparently substitute the spare resource for the failing one in some situations. To take advantage of this situation, the partition’s OS queries the free DR slot(s) of the resource type to determine if a spare resource is available, and if so uses the other DR RTAS calls to acquire the resource. In some cases resources are free because they have not been assigned to partitions. A platform may optionally provide an unused CoD resource to a partition as a predictive failure spare. In such cases, the result of an get-sensor-state (entity-sense) for the DR slot returns the state of “exchange”. Between the time that the OS takes ownership, via set-indicator (allocation-state, exchange), of the spare CoD resource available and the OS gives up the failing resource, the platform exceeds the licensed entitlement for that resource. R1--1. For the Predictive Failure Sparing option: The platform, upon provisionally making available a spare CoD resource in response to a predictive failure, must set the CoD Resource Provisional Activation timer, to time out the use of the provisionally activated excess resources.

Enhanced CoD Capacity Info These two read only system parameters (one for processor and one for memory) return ASCII hexadecimal strings representing the current licensed CoD resources. The strings are constructed with a fixed “base” section followed by zero or more optional sections. The definitions below show all optional sections. The caller should not expect the presence or order of all optional sections. Each optional section starts with the following 3 members: Offset to next section (zero for last section) Size in bytes of the section (including these three members) Name of the section The specific meaning of the members of each section is beyond the scope of this architecture; refer to the specific platform design documents. All data in these tables are composed of printable ASCII characters. There are no NULLs or other non-printable characters. Programming Note: On a platform where the base processor capacity is a fraction of a full processor, the data in the BaseProc section below is rounded up to the next larger whole number. R1--1. For the CoD option: The platform's ibm,get-system-parameter RTAS call, specifying the Enhanced CoD Processor Capacity Info, must, upon successful completion, return the ASCII representation of the information defined in for managed CoD processor resources. Enhanced CoD Processor Capacity Info, Version 1 Offset Size in Bytes Section Description Format 0 4 Meta Table version indicator 4 ASCII characters “V1 “ 4 4 Meta Decimal number of optional sections 4 ASCII numeric characters 8 4 Base Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section 4 ASCII numeric characters 12 4 Base Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “78 “ 16 8 Base Section Name 8 ASCII characters “BASE “ 24 4 Base system type 4 ASCII characters 28 10 Base System serial number 10 ASCII characters: “pp-ssssss “ 38 4 Base CoD capacity card CCIN 4 ASCII characters 42 10 Base CoD capacity card serial number 10 ASCII characters “hh-hssssss” 52 16 Base CoD capacity card unique ID 16 ASCII characters 68 4 Base CoD resource identifier 4 ASCII characters 72 4 Base Quantity of permanently activated resources 4 ASCII characters 76 4 Base CoD sequence number 4 numeric ASCII characters 80 2 Base CoD activation code entry check 1 byte hex check sum, 2 ASCII characters -- based on EBCDIC representation of previous 8 entries. 82 4 Base Total CoD resources installed in system 4 numeric ASCII characters On/Off Processor Resources 0 4 OnOffPrc Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section. 4 ASCII numeric characters 4 4 OnOffPrc Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “66 “ 8 8 OnOffPrc Section Name 8 ASCII characters “ONOFFPRC” 16 1 OnOffPrc On/Off CoD enabled 1 ASCII character '0' or '1' 17 1 OnOffPrc On/Off CoD active 1 ASCII character '0' or '1' 18 4 OnOffPrc On/Off CoD feature 4 ASCII characters 22 4 OnOffPrc On/Off CoD activated resources 4 ASCII numeric characters 26 4 OnOffPrc On/Off CoD sequence number 4 ASCII numeric characters 30 2 OnOffPrc On/Off CoD checksum 2 ASCII characters 32 4 OnOffPrc On/Off CoD resources requested 4 ASCII numeric characters 36 4 OnOffPrc On/Off CoD days requested 4 ASCII numeric characters 40 4 OnOffPrc On/Off CoD resource days expired 4 ASCII numeric characters 44 4 OnOffPrc On/Off CoD resource days remaining 4 ASCII numeric characters 48 4 OnOffPrc On/Off CoD counter 4 ASCII numeric characters 52 4 OnOffPrc On/Off standby resources available 4 ASCII numeric characters 56 1 OnOffPrc On/Off reserved byte 1 ASCII blank 57 4 OnOffPrc On/Off history of requested resource days 4 ASCII characters 61 1 OnOffPrc On/Off reserved byte 1 ASCII blank 62 4 OnOffPrc On/Off history of unreturned resource days 4 ASCII characters Debit Processor Resources 0 4 DebitPrc Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section. 4 ASCII numeric characters 4 4 DebitPrc Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “82 “ 8 8 DebitPrc Section Name 8 ASCII characters “DEBITPRC” 16 1 DebitPrc Debit CoD enabled 1 ASCII character '0' or '1' 17 1 DebitPrc Debit CoD active 1 ASCII character '0' or '1' 18 4 DebitPrc Debit CoD feature 4 ASCII characters 22 4 DebitPrc Debit CoD activated resources 4 ASCII numeric characters 26 4 DebitPrc Debit CoD sequence number 4 ASCII numeric characters 30 2 DebitPrc Debit CoD checksum 2 ASCII characters 32 4 DebitPrc Debit CoD resources requested 4 ASCII numeric characters 36 12 DebitPrc Debit Reserved 12 ASCII blanks 48 4 DebitPrc Debit counter 4 ASCII characters 52 4 DebitPrc Debit standby resources available 4 ASCII numeric characters 56 1 DebitPrc Debit reserved byte 1 ASCII blank 57 4 DebitPrc Debit history of expired resource days 4 ASCII numeric characters 61 1 DebitPrc Debit reserved byte 1 ASCII blank 62 4 DebitPrc Debit history of unreturned resource days 4 ASCII characters 66 8 DebitPrc Extended total history of requested On/Off Processor days 8 ASCII characters 74 8 DebitPrc Extended total history of unreturned On/Off Processor days 8 ASCII characters Trial Processor Resources 0 4 TrialPrc Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section. 4 ASCII numeric characters 4 4 TrialPrc Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “66 “ 8 8 TrialPrc Section Name 8 ASCII characters “TRIALPRC” 16 1 TrialPrc Trial CoD enabled 1 ASCII character '0' or '1' 17 1 TrialPrc Trial reserved 1 ASCII blank 18 4 TrialPrc Trial CoD feature 4 ASCII characters 22 4 TrialPrc Trial CoD activated resources 4 ASCII numeric characters 26 4 TrialPrc Trial CoD sequence number 4 ASCII numeric characters 30 2 TrialPrc Trial CoD checksum 2 ASCII characters 32 8 TrialPrc Trial reserved bytes 8 ASCII blanks 40 4 TrialPrc Trial days expired 4 ASCII numeric characters 44 4 TrialPrc Trial days remaining 4 ASCII numeric characters 48 14 TrialPrc Trial reserved bytes 14 ASCII blanks 62 4 TrialPrc Trial unreturned resources 4 ASCII numeric characters Base Processor Resources 0 4 BaseProc Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section. 4 ASCII numeric characters 4 4 BaseProc Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “20 “ 8 8 BaseProc Section Name 8 ASCII characters “BASEPROC” 16 4 BaseProc Number of Non-CoD “Base” processors on this system. NOTE: If this section is not present, there are no “base” processors and all processors are CoD activated. 4 ASCII numeric characters

R1--2. For the CoD option: The platform's ibm,get-system-parameter RTAS call, specifying the Enhanced CoD Memory Capacity Info, must, upon successful completion, return the ASCII representation of the information defined in for the managed CoD memory resources. Enhanced CoD Memory Capacity Info, Version 1 Offset Size in Bytes Section Description Format 0 4 Meta Table version indicator 4 ASCII characters “V1 ” 4 4 Meta Decimal number of optional sections 4 ASCII numeric characters 8 4 Base Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section. 4 ASCII numeric characters 12 4 Base Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “78 ” 16 8 Base Section Name 8 ASCII characters “BASE ” 24 4 Base system type 4 ASCII characters 28 10 Base System serial number 10 ASCII characters: “pp-ssssss ” 38 4 Base CoD capacity card CCIN 4 ASCII characters 42 10 Base CoD capacity card serial number 10 ASCII characters “hh-hssssss” 52 16 Base CoD capacity card unique ID 16 ASCII characters 68 4 Base CoD resource identifier 4 ASCII characters 72 4 Base Quantity of permanently activated resources 4 ASCII characters 76 4 Base CoD sequence number 4 numeric ASCII characters 80 2 Base CoD activation code entry check 1 byte hex check sum, 2 ASCII characters -- based on EBCDIC representation of previous 8 entries. 82 4 Base Total CoD resources installed in system 4 numeric ASCII characters On/Off Memory Resources 0 4 OnOffMem Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section. 4 ASCII numeric characters 4 4 OnOffMem Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “67 “ 8 8 OnOffMem Section Name 8 ASCII characters “ONOFFMEM” 16 1 OnOffMem On/Off CoD enabled 1 ASCII character '0' or '1' 17 1 OnOffMem On/Off CoD active 1 ASCII character '0' or '1' 18 4 OnOffMem On/Off CoD feature 4 ASCII characters 22 4 OnOffMem On/Off CoD activated resources 4 ASCII numeric characters 26 4 OnOffMem On/Off CoD sequence number 4 ASCII numeric characters 30 2 OnOffMem On/Off CoD checksum 2 ASCII characters 32 4 OnOffMem On/Off CoD resources requested 4 ASCII numeric characters 36 4 OnOffMem On/Off CoD days requested 4 ASCII numeric characters 40 4 OnOffMem On/Off CoD resource days expired 4 ASCII numeric characters 44 4 OnOffMem On/Off CoD resource days remaining 4 ASCII numeric characters 48 4 OnOffMem On/Off CoD counter 4 ASCII numeric characters 52 4 OnOffMem On/Off standby resources available 4 ASCII numeric characters 56 1 OnOffMem On/Off reserved byte 1 ASCII blank 57 4 OnOffMem On/Off history of requested resource days 4 ASCII characters 61 1 OnOffMem On/Off reserved byte 1 ASCII blank 62 4 OnOffMem On/Off history of unreturned resource days 4 ASCII characters 66 1 OnOffMem On/Off Memory Multiplier 1 ASCII numeric characters Debit Memory Resources 0 4 DebitMem Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section. 4 ASCII numeric characters 4 4 DebitMem Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “83 “ 8 8 DebitMem Section Name 8 ASCII characters “DEBITMEM” 16 1 DebitMem Debit CoD enabled 1 ASCII character '0' or '1' 17 1 DebitMem Debit CoD active 1 ASCII character '0' or '1' 18 4 DebitMem Debit CoD feature 4 ASCII characters 22 4 DebitMem Debit CoD activated resources 4 ASCII numeric characters 26 4 DebitMem Debit CoD sequence number 4 ASCII numeric characters 30 2 DebitMem Debit CoD checksum 2 ASCII characters 32 4 DebitMem Debit CoD resources requested 4 ASCII numeric characters 36 12 DebitMem Debit Reserved 12 ASCII blanks 48 4 DebitMem Debit counter 4 ASCII characters 52 4 DebitMem Debit standby resources available 4 ASCII numeric characters 56 1 DebitMem Debit reserved byte 1 ASCII blank 57 4 DebitMem Debit history of expired resource days 4 ASCII numeric characters 61 1 DebitMem Debit reserved byte 1 ASCII blank 62 4 DebitMem Debit history of unreturned resource days 4 ASCII characters 66 8 DebitMem Extended total history of requested On/Off Memory GB days 8 ASCII characters 74 8 DebitMem Extended total history of unreturned On/Off Memory GB days 8 ASCII characters 82 1 DebitMem Debit reserved byte 1 ASCII blank Trial Memory Resources 0 4 TrialMem Decimal offset in bytes from the start of this section to the start of the next section (that is, the offset from the first byte of this member to the first byte of the next section). Zero if the last section. 4 ASCII numeric characters 4 4 TrialMem Section Length -- The length of this section in bytes (including offset member above) 4 ASCII numeric characters “67 ” 8 8 TrialMem Section Name 8 ASCII characters “TRIALMEM” 16 1 TrialMem Trial CoD enabled 1 ASCII character '0' or '1' 17 1 TrialMem Trial reserved 1 ASCII blank 18 4 TrialMem Trial CoD feature 4 ASCII characters 22 4 TrialMem Trial CoD activated resources 4 ASCII numeric characters 26 4 TrialMem Trial CoD sequence number 4 ASCII numeric characters 30 2 TrialMem Trial CoD checksum 2 ASCII characters 32 8 TrialMem Trial reserved bytes 8 ASCII blanks 40 4 TrialMem Trial days expired 4 ASCII numeric characters 44 4 TrialMem Trial days remaining 4 ASCII numeric characters 48 14 TrialMem Trial reserved bytes 14 ASCII blanks 62 4 TrialMem Trial unreturned resources 4 ASCII numeric characters 66 1 TrialMem Trial reserved byte 1 ASCII blank

R1--3. For the CoD option: The platform's ibm,set-system-parameter RTAS call specifying the Enhanced CoD Capacity Info, must not return a Status of 0 (Success); the expected return is a Status of -9002 (setting not allowed/authorized), however, under special cases a Status of -1 (Hardware Error) or one of the Busy or Extended Delay return Status values is allowed.

<emphasis>Restart Parameters</emphasis> This section and its subsections describe parameters that govern the actions that the platform firmware takes upon a restart (that is, reboot) after an unintended termination.

partition_auto_restart Parameter The partition_auto_restart parameter governs whether or not platform firmware attempts to restart a partition after an error which causes an abnormal partition termination. Neither a loss of external power without a UPS, nor a loss of external power and battery power with a UPS are examples of an error which causes abnormal partition termination. For terminations that involve only a specific partition (for example, a machine check), the partition_auto_restart parameter governs whether the partition restarts. For terminations that span the entire platform (for example, a checkstop), the platform may separately govern whether or not the entire platform restarts. If the platform does restart, however, partition_auto_restart determines whether or not an individual partition restarts. R1--1. For the LPAR option with the System Parameters option: If the platform supports the partition_auto_restart system parameter, the platform must establish and maintain across boot (unless explicitly altered by a user) for each partition the one (1) byte parameter, the initial value (depending upon platform policy) is one of the following binary values: 0 - Do not automatically restart the partition 1 - Automatically restart the partition R1--2. For the SMP option (non-LPAR) with the System Parameters option: If the platform supports the partition_auto_restart system parameter, the platform must establish and maintain across boot (unless explicitly altered by a user) one and only one (1) byte parameter, the initial value (depending upon platform policy) is one of the following binary values: 0 - Do not automatically restart the OS 1 - Automatically restart the OS

platform_auto_power_restart Parameter The platform_auto_power_restart parameter governs whether or not platform firmware attempts to restart after power is restored following a power outage. R1--1. For the System Parameters option: If the platform supports the platform_auto_power_restart system parameter, the platform must maintain across boot (unless explicitly altered by a user) one and only one platform wide value of the one (1) byte parameter having one of the following binary values: 0 - Do not automatically restart 1 - Automatically restart partitions that were active when external power was lost. R1--2. For the LPAR option with the System Parameters option: If the platform supports the platform_auto_power_restart system parameter, the platform must provide the authority to set and read the platform_auto_power_restart system parameter to, at most, one partition at a time.

Remote Serial Port System Management Parameters R1--1. For the LPAR option with the System Parameters option: If the platform supports any of the following system parameters: sp-remote-pon, sp-rb4-pon, sp-snoop-str, and sp-serial-snoop; the platform must grant authority to set and read the single platform wide values of the respective system parameters to only the partition owning the resource required to implement the function, such as a serial port, where the valid data for the parameters are specified in . R1--2. For the System Parameters option: The platform must support the sp-rb4-pon system parameter if and only if the sp-remote-pon system parameter is supported and implemented by using “Ring Indicate” of a serial port. R1--3. For the LPAR option with the System Parameters option: Platforms that supports the sp-snoop-str system parameter must maintain one and only one platform wide NULL terminated ASCII string value of the parameter; granting authority to set and read the sp-snoop-str system parameter to, at most, one partition at a time. R1--4. For the System Parameters option: To prevent return data truncation of the returned sp-snoop-str system parameter from the ibm,get-system-parameter RTAS call the caller must supply a buffer length sufficient to contain the two string length bytes plus the ASCII string and the terminating ASCII NULL. R1--5. For the System Parameters option: The caller of the ibm,get-system-parameter RTAS call must supply a buffer length sufficient to contain the two string length bytes plus the ASCII string and the terminating ASCII NULL to prevent return data truncation of the returned sp-snoop-str system parameter. R1--6. For the System Parameters option: The platform must supports both the sp-snoop-str and sp-serial-snoop system parameters if it supports either.

Surveillance Parameters For the definition of the sp-sen, sp-sti, and sp-del parameters, see . R1--1. For the LPAR option with the System Parameters option: If the platform supports any of the following system parameters: sp-sen, sp-sti, or sp-del; the platform must grant authority to set and read the single platform wide one (1) byte values, where the decimal representation is defined in , of the respective system parameters to only one partition at a time. R1--2. For the System Parameters option: If the platform supports any of the sp-sen, sp-sti or sp-del system parameters, it must support then all.

Call Home Parameter This parameter is used to provide input concerning certain call home values used when a call home function is provided. The data for the parameter is an ASCII string which provides additional information R1--1. For the LPAR option with the System Parameters option: If the platform supports the sp-call-home parameter, platform must grant authority to set and read the single platform wide value of the system parameter at any time to only one partition; where the data for the parameter is an ASCII string in the form <String_name1>=<string><ASCII NULL><String_name2>=<string><ASCII NULL>....<String_nameN>=<string><ASCII NULL><ASCII NULL> with string names defined as per . R1--2. For the System Parameters option: The caller of the ibm,get-system-parameter RTAS call must supply a buffer length sufficient to contain the maximum possible ASCII string returned, including the two ASCII NULLs where indicates the maximum length of the data for each substring that comprises the sp-call-home data, to prevent return data truncation of the returned sp-call-home system parameter. R1--3. For the System Parameters option: If the platform supports the sp-call-home parameter, the platform must provide the sp-call-home parameter value defaults listed in prior to any ibm,set-system-parameter RTAS call. sp-call-home Strings String_Name Default Range Maximum Characters in String Data Description sp-rt-s<N> NULL 20 Retry string for serial port <N> sp-ic-s<N> NULL 12 Protocol interdata block delay (*IC) for serial port <N> sp-to-s<N> NULL 12 Protocol time out (*DT) for serial port <N> sp-cd-s<N> NULL 12 Call Delay (*CD) for serial port <N> sp-connect-s<N> NULL 12 Connect (*CX) for serial port <N> sp-disconnect-s<N> NULL 12 Disconnect (*DX) for serial port <N> sp-condout-s<N> NULL 12 Call-out condition (*C0) for serial port <N> sp-condwait-s<N> NULL 12 Call-wait (*C0) for serial port <N> sp-condin-s<N> NULL 12 Call-in condition (*C1) for serial port <N> sp-waitcall-s<N> NULL 12 Wait call (*WC) for serial port <N> sp-page-s<N> NULL 20 Describes how to Page a beeper for serial port <N> sp-diok-s<N> off on,off 4 Serial Port <N> Call-in (Dial in authorized on the port) sp-dook-s<N> off on,off 4 Serial Port <N> Call-out (Dial out authorized on the port) sp-dookc off on,off 4 Call-out before restart (Dial out for system crash using authorized serial port) sp-ls-s<N> 9600 300, 600, 1200, 2000, 2400, 3600, 4800, 7200, 9600, 19200, 38400 6 S<N> line speed sp-modemf-s<N> NULL 120 Filename of the last modem file used to configure modem parameters sp-phsvc 20 blank characters 20 characters max 20 Service Center Telephone Number (*PS) sp-phadm 20 blank characters 20 characters max 20 Customer Administration Center Telephone Number (*PH) sp-pager 20 blank characters 20 characters max 20 Digital Pager Telephone Number sp-phsys 20 blank characters 20 characters max 20 Customer System Telephone number (*PY) sp-vox 20 blank characters 20 characters max 20 Customer Voice telephone number (*PO) sp-acct 12 blank characters 12 characters max 12 Customer Account Number (*CA) sp-cop first first, all 6 Call-out policy (first/all) - numbers to call in case of failure sp-retlogid 12 blank characters 12 characters max 12 Customer RETAIN Login Userid (*LI) sp-retpw 16 blank characters 16 characters max 12 Customer RETAIN Login password (*PW) sp-rto 120 >1 12 Remote Timeout (in seconds) (*RT) sp-rlat 2 > 1 12 Remote Latency (in seconds) (*RL) sp-rn 2 0 or any positive number 12 Number of retries (while busy (*RN)) sp-sysname 15 blank characters 15 characters max 15 System Name (system administrator aid)

Notes: <N> is substituted with a modem number, i.e. 1 or 2. NULL as a default indicates that the string is given a name, but no value, e.g. sp-modemf-s=

Current Flash Image Parameter In systems with storage for more than one Flash image, the sp-current-flash-image parameter indicates which Flash image is currently being used by the service processor. This is typically the Flash image used at the last boot. R1--1. For the LPAR option with the System Parameters option: Platforms that supports the sp-current-flash-image system parameter, must authorize all partitions to get the single platform wide one (1) byte value of the system parameter, whose decimal representation is defined in . R1--2. For the System Parameters option: Platforms that supports the sp-current-flash-image system parameter must support the ibm,manage-flash-image RTAS call.

Platform Dump Max Size Parameter This parameter indicates the size (in bytes) needed for dumps returned from the ibm,platform-dump RTAS function. R1--1. For the Platform Dump option: If the ibm,platform-dump RTAS call is authorized for the partition, the platform must authorize the partition to get the platform-dump-max-size system parameter; where the value returned must indicate the sum (in bytes) of the maximum size of each unique platform dump type that the ibm,platform-dump RTAS call could return. Programming Note: The intent of platform-dump-max-size is for the platform to specify, in advance, the sum of the maximum sizes of all the unique dump types that it can generate. This is to allow the OS to reserve space for one log of each type. In the case of any change in the value of this parameter, the platform may generate a Platform Event Log entry announcing the change in the maximum size, and specifying the new size in the IO Events Section. This entry, when generated, is then returned by the event-scan RTAS call.

Storage Preservation Option System Parameters The epow3-quiesce-time system parameter contains the time granted to the current instance of a client program to perform quiesce activities in preparation for a memory preservation boot. This quiesce time is the time used by the client program to do such things as quiesce and power off I/O not needed for memory preservation boot processing, in order to conserve batteries. A client program utilizing the Storage Preservation option, upon completion of quiesce activities, requests a reboot. The platform, upon seeing an EPOW class 3 condition, and if both the memory-preservation-boot-time and epow3-quiesce-time system parameters are non-zero, starts a timer with an initial value equal to the epow3-quiesce-time. If the timer expires before the client program performs a reboot, the platform forces a reboot of the client program. The memory-preservation-boot-time system parameter contains the time granted to the rebooted instance of the partition to perform the saving of preserved memory. The client program, upon completion of the saving of preserved memory, requests a shutdown. The platform, upon initiation of a memory preservation boot starts a timer with an initial value equal to the memory-preservation-boot-time, providing the value of the memory-preservation-boot-time parameter is non-zero. If the timer expires before the client program performs a shutdown, the platform forces a shutdown of the client program. Thus, the platform uses the memory-preservation-boot-time system parameter as a policy attribute. If the client program has set the value of this parameter to a non-zero value, then the memory preservation boot timers are enabled. If the memory-preservation-boot-time parameter is zero (independent of the epow3-quiesce-time setting), the platform does not initiate the memory preservation boot timers. To use the memory preservation boot timers, the client program registers its LMBs for preservation and sets the memory-preservation-boot-time via the ibm,set-system-parameter RTAS call. If an EPOW class 3 is sent to a client program and the client program has set its memory-preservation-boot-time parameter, then the platform starts the timer for epow3-quiesce-time. The client program on reboot uses the get-sensor RTAS call (to detect EPOW condition) and the “ibm,preserved-storage” property in the device tree to drive memory preservation processing as necessary. The values of memory-preservation-boot-time and epow3-quiesce-time prior to being set for a client program are 0. These system parameters are persisted, as are all system parameters. R1--1. For the Storage Preservation option: The platform must implement the memory-preservation-boot-time and epow3-quiesce-time system parameters and must set their initial values to 0. R1--2. For the Storage Preservation option: If the memory-preservation-boot-time system parameter is non-zero for a client program and if the platform delivers an EPOW class 3 indication to the client program, the platform must do all of the following: Upon delivering the EPOW class 3 to the client program, if the epow3-quiesce-time system parameter is non-zero, then set a timer based on the client program’s epow3-quiesce-time system parameter and force a reboot of the client program on timer expiration, if the client program does not request a reboot itself before the timer expires. Upon initiation of the memory preservation boot, set a timer based on the client program’s memory-preservation-boot-time system parameter and on timer expiration, force a shutdown of the client program if the client program does not request a shutdown itself before the timer expires.

SCSI Initiator Identifier System Parameters Certain physical SCSI IOAs maintain their previous settings for SCSI initiator identifier, while others require that the platform set this value during I/O adapter initialization. Since the initialization of I/O adapters in the boot path is done by firmware, a method is required for the OS to inform the platform firmware of such settings. Given that an OS owns a slot, and that slot contains a supported SCSI I/O adapter, the OS may use the ibm,set-system-parameter RTAS call specifying SCSI Initiator Identifier system parameters to instruct the firmware how to initialize the I/O adapter to ensure that it does not conflict with other SCSI initiators on the same bus. The ibm,get-system-parameter RTAS call is used to verify the SCSI Initiator Identifier system parameters value for any OS owned slot. When ibm,set-system-parameter is called specifying SCSI Initiator Identifier system parameters, the buffer contains the standard two byte length field plus two NULL terminated strings. The first string contains the location code of an I/O Adapter's SCSI bus connector, and the second string contains one of the decimal values 0-15 representing the value of the SCSI Initiator Identifier that the platform's firmware is to use to initialize the SCSI controller for that bus. When ibm,get-system-parameter is called specifying SCSI Initiator Identifier system parameters, the buffer contains the standard two byte length field plus a NULL terminated string that contains the location code of an I/O Adapter's SCSI bus connector. Upon successful return, the buffer contains the standard two byte length field plus two NULL terminated strings. The first string contains the location code of the I/O Adapter's SCSI bus connector, and the second string contains one of the decimal values 0-15 representing the value of the SCSI Initiator Identifier that the platform's firmware is to use to initialize the SCSI controller for that bus. Implementation Note: For IOAs that have multiple connectors per bus, the location code specifies the connector for the external bus. Interaction between SCSI Initiator Identifier system parameters and DR operations produce unique situations. The platform maintains only the latest SCSI Initiator Identifier set for any given location code. On DR operations, the value is normally retained until the IOA owner explicitly changes it. If a DR operation replaces the original IOA with a different type of IOA, such that the previously set SCSI Initiator Identifier system parameters no longer make sense (IOA is not a supported SCSI adapter or the connector location codes do not match), the platform firmware clears the SCSI Initiator Identifier system parameters for the location code and performs the platform default IOA initialization. R1--1. For the SCSI Initiator Identifier System Parameters option: When ibm,set-system-parameter is called specifying SCSI Initiator Identifier system parameters, RTAS must return Status of -3 (Parameter error) on any of the following conditions: The binary value of the first two bytes in the buffer, plus 2, is greater than the buffer length parameter. The buffer length parameter is greater than 1026. The N bytes of buffer contents (N being the binary value of the first two buffer bytes) does not contain two NULL terminated strings. The contents of the first NULL terminated buffer string does not match the format of a valid platform location code. The contents of the second NULL terminated buffer string does not contain a decimal value in the range of 0 to 15. R1--2. For the SCSI Initiator Identifier System Parameters option: When ibm,set-system-parameter is called specifying SCSI Initiator Identifier system parameters, and the request successfully passes the Requirements of , the first NULL terminated buffer string must contain a valid formatted platform location code for a currently installed slot owned by the calling OS, or the platform must return “Not authorized” Status. R1--3. For the SCSI Initiator Identifier System Parameters option: When ibm,set-system-parameter is called specifying SCSI Initiator Identifier system parameters, and the request successfully passes the Requirements of , the first NULL terminated buffer string must contain a valid formatted platform location code for a SCSI bus connector of a supported SCSI I/O adapter currently installed in a slot owned by the calling OS, or the platform must return “Parameter Error” Status. R1--4. For the SCSI Initiator Identifier System Parameters option: When ibm,set-system-parameter is called specifying SCSI Initiator Identifier system parameters, and the request successfully passes the Requirements of , the firmware must record the value supplied in the second NULL terminated buffer string for use in initializing the SCSI initiator identifier of the SCSI I/O adapter contained in the slot specified by the first NULL terminated buffer string and return a Status of 0 (success) (except in the case of hardware errors or busy conditions). R1--5. For the SCSI Initiator Identifier System Parameters option: When ibm,get-system-parameter is called specifying SCSI Initiator Identifier system parameters, RTAS must return a Status of -3 (Parameter error) on any of the following conditions: The binary value of the first two bytes in the buffer, plus 2, is greater than the buffer length parameter. The buffer length parameter is greater than 1026. The N bytes of buffer contents (N being the binary value of the first two buffer bytes) does not contain one NULL terminated string. The contents of the NULL terminated buffer string does not match the format of a valid platform location code. R1--6. For the SCSI Initiator Identifier System Parameters option: When ibm,get-system-parameter is called specifying SCSI Initiator Identifier system parameters, and the request successfully passes the Requirements of , the NULL terminated buffer string must contain a valid formatted platform location code for a currently installed slot owned by the calling OS, or the platform must return “Not authorized” Status. R1--7. For the SCSI Initiator Identifier System Parameters option: When ibm,get-system-parameter is called specifying SCSI Initiator Identifier system parameters, and the request successfully passes the Requirements of , the NULL terminated buffer string must contain a valid formatted platform location code for a SCSI bus connector of a supported SCSI I/O adapter currently installed in a slot owned by the calling OS, or the platform must return a Status of -3 (parameter error). R1--8. For the SCSI Initiator Identifier System Parameters option: When ibm,get-system-parameter is called specifying SCSI Initiator Identifier system parameters, and the request successfully passes the Requirements of , the firmware must: Increase the value contained in the first two bytes of the buffer to cover both the length of the location code NULL terminated string and a NULL terminated string representing the decimal value that the platform uses to initialize the SCSI initiator identifier of the SCSI I/O adapter contained in the slot specified by the first NULL terminated buffer string. If there is room in the buffer, append the NULL terminated string representing the decimal value that the platform uses to initialize the SCSI initiator identifier of the SCSI I/O adapter contained in the slot specified by the first NULL terminated buffer string. Return a Status of 0 (success) (except in the case of hardware errors or busy conditions). R1--9. For the SCSI Initiator Identifier System Parameters option: When the platform firmware initializes an IOA and a SCSI Initiator Identifier system parameter is set for that IOA's slot location code, and the SCSI Initiator Identifier system parameter is incompatible with the currently installed IOA (IOA is not a supported SCSI adapter or the connector location codes do not match a SCSI bus connector for that IOA), the platform must clear the incompatible SCSI Initiator Identifier system parameter and proceed to initialize the IOA using platform default behaviors.

CoD Options The CoD Options system parameter allows specification of various CoD options. R1--1. ibm,get-system-parameter is called specifying the CoD Options system parameter, the first two bytes of the value returned must contain the full length of the parameter data, including the length of the NULL. The two byte binary length field is followed by a variable of the form “keyword” followed by “=” followed by “value” and terminated by a semicolon (“;”), where the contents of “value” must be an ASCII printable character string. R1--2. The corresponding keyword and values for the CoD Options parameter are defined in . CoD Options System Parameter Keyword and Values Keyword Permitted Values Definition LPoptions yes, no no: The platform does not support the Low Priced adapters and devices. yes: The platform supports the Low Priced adapters and devices. Absence of the keyword is the same as the keyword with the value of “yes”.

Platform Error Classification The Platform Error Classification system parameter specifies whether the OS should process platform reported errors as informational errors as opposed to service actionable events. R1--1. When ibm,get-system-parameter is called specifying the Platform Error Classification system parameter, the platform must return a value of “1” if all errors returned in event-scan, check-exception, rtas-last-error and ibm,slot-error-detail calls should be treated as informational errors in the sense that they not be reported by service applications as service actionable events and otherwise must return a value of “0”. Programming Note: Service applications within an operating system may obtain information about platform errors and take service actions (such as reporting the errors to a call center or other error aggregation point) based on errors logged. Service applications running in multiple partitions, each receiving platform error events, may all report the same error to an aggregation point causing duplicated error reports. To eliminate this duplication, a platform might choose to log errors to only one partition in a system. That, however, would leave an incomplete error record in individual partition and eliminate notifications that each partition OS should be aware of (such as EPOW events). To allow platform errors to be reported to an OS, but prevent the forwarding of the errors as service actionable events to an error aggregation point, the Platform Error Classification system parameter may be set to a value of 1. The OS should not change how it logs an error based on this parameter, nor should the OS change any error severity associated with the log based on the parameter. Rather it is left to service applications to query the system parameter and take actions based on it.

Firmware Boot Options The Firmware Boot Options system parameter allows specification of various firmware boot settings. R1--1. When ibm,get-system-parameter is called specifying the Firmware Boot Options system parameter, the first two bytes of the value returned must be binary and must contain the full length of the parameter data, including the length of the NULL, and the field following length field must be a variable of the form: “keyword” followed by “=” followed by “value” and terminated by a semicolon (“;”), where the contents of “value” must be an ASCII printable character string. R1--2. When ibm,set-system-parameter is called specifying the Firmware Boot Options system parameter, the first two bytes of the buffer must be binary and must contain the full length of the parameter data, including the length of the NULL, and the field following length field must be a variable of the form: “keyword” followed by “=” followed by “value” and terminated by a semicolon (“;”), where the contents of “value” must be an ASCII printable character string, and if the caller is not authorized to adjust at least one of the specified keywords, the call must return with a status of -9002. R1--3. Keyword and values for the Firmware Boot Options parameter must be as defined in . Firmware Boot Options System Parameter Keywords and Values Keyword Permitted Values Definition PlatformBootSpeed fast, slow fast: The platform will perform a minimal set of hardware tests before loading the OS. slow: The platform will perform a comprehensive set of hardware tests before loading the OS. Absence of the keyword implies the platform does not support an alterable boot speed.

Platform Processor Diagnostics Options The platform-processor-diagnostics-run-mode system parameter allows the operating system to query or control how platform run-time processor diagnostics are executed by the platform. Provision is made by this parameter for the platform to execute run-time diagnostic tests to verify various processor functions. These diagnostics tests typically would be performed by the hypervisor against each processor in the system. R1--1. When ibm,get-system-parameter is called with the platform-processor-diagnostics-run-mode token, the platform must return a one-byte parameter indicating the current run-mode of platform processor diagnostics as one of the following: 0 = disabled: indicates that the platform will not run processor run-time diagnostics. 1 = staggered: indicates that the platform is set to run processor diagnostics on each processor on a periodic basis, but not attempt to schedule the tests for all processors at the same time. The frequency at which the tests will run are defined by the platform. 2 = immediate: indicates that the platform is currently in the processor of running diagnostics against the processors in a system on a non-staggered basis, either as a result of an “immediate” or “periodic” setting. 3 = periodic: indicates that the platform is scheduled to run diagnostics against all the processors in the system at a specific time scheduled by the platform. R1--2. When ibm,set-system-parameter is called specifying the platform-processor-diagnostics-run-mode token, the one-byte parameter passed must indicate the run-mode of platform periodic diagnostics desired as one of the following: 0 = disabled: indicates that the platform should not run any processor run-time diagnostics. Any currently running diagnostics will be terminated. 1 = staggered: indicates that the platform should run diagnostics periodically against each processor in the system, but not attempt to schedule the tests for all processors at the same time. The frequency at which the tests will run are defined by the platform. 2 = immediate: indicates that the platform should immediately begin the process of running processor diagnostics on all of the processors in the system, This setting only temporarily overrides the setting of “disabled”, “staggered” or “periodic” and the platform will revert to the last setting of “disabled”, “staggered” or “periodic” once the immediately run diagnostics are complete. Implementation Notes: To prevent conflicts in the setting of the run-mode, the platform should only support this parameter for one partition in a running system. The options may also be set by the platform. The last value set will take precedent over any previous settings.

Processor Module Information The Processor Module Information system parameter allows transferring of certain processor module information from the platform to the OS. The information in the parameter is global for the platform and encompasses all resources on the platform, not just those available to the partition, and the ibm,get-system-parameter will never return a Status of -9002 (Not Authorized). This parameter is read-only. R1--1. For the LPAR option with the System Parameters option: If the platform supports the Processor Module Information system parameter, then it must provide the following information in the parameter, and the information returned for every partition must be the same, with all the resources of the platform encompassed: 2 byte binary number (N) of module types followed by N module specifiers of the form: 2 byte binary number (M) of sockets of this module type 2 byte binary number (L) of chips per this module type 2 byte binary number (K) of cores per chip in this module type. R1--2. For the LPAR option with the System Parameters option: For the Processor Module Information system parameter, the ibm,get-system-parameter RTAS call must never return a Status of -9002 (Not Authorized), and the ibm,set-system-parameter RTAS call must always return a Status of -9002 (Setting not allowed/authorized).

Cede Latency Settings Information The Cede Latency Settings Information system parameter informs the OS of the maximum latency to wake up from the various platform supported processor sleep states that it might employ for idle processors. The information in the parameter is global for the platform and encompasses all processors on the platform, and the ibm,get-system-parameter will never return a Status of -9002 (Not Authorized). This parameter is read-only. As the architecture evolves, the number of fields per record are likely to increase, calling software should be written to handle fewer fields (should it find itself running on a platform supporting an older version of the architecture) and ignore additional fields (should it find itself running on a platform supporting a newer version of the architecture). Due to partition migration, the support for the cede latency setting system parameter, the number of supported cede latency settings (and thus the number of reported records) and the number of fields reported per record might change from call to call; calling software should be written to handle this variability R1--1. For the PEM option with the System Parameters option: If the platform supports the cede latency settings information system parameter it must provide the following information in the NULL terminated parameter string: The first byte is the binary length “N” of each cede latency setting record minus one (zero indicates a length of 1 byte) For each supported cede latency setting a cede latency setting record consisting of: The first “N” bytes of . Byte definitions within a cede latency setting record Order of fields within a record Field Length Values Comments Cede Latency Specifier Value 1 Binary Values 0-255 Records in ascending cede latency specifier value order with no holes. Maximum wakeup latency in time base ticks 8 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF Responsive to external interrupts 1 Binary True/False

R1--2. For the PEM option with the System Parameters option: For the cede latency specifier system parameter, the ibm,get-system-parameter RTAS call must never return a Status of -9002 (Not Authorized), and the ibm,set-system-parameter RTAS call must always return a Status of -9002 (Setting not allowed/authorized).

Target Active Memory Compression Factor The target active memory compression factor system parameter informs the OS of the target memory capacity increase the customer expects to achieve due to active memory compression. The factor is expressed in whole percentage with the minimum value of 100 and the maximum value of 1000. The ibm,get-system-parameter for parameter token 46 will never return a Status of -9002 (Not Authorized). This parameter is read-only. R1--1. For the Active Memory Compression option with the System Parameters option: For the Target Active Memory Compression Factor system parameter, the ibm,get-system-parameter RTAS call must never return a Status of -9002 (Not Authorized). R1--2. For the Active Memory Compression option with the System Parameters option: If the Active Memory Compression option is enabled for the partition, the platform must provide in response to the ibm,get-system-parameter for parameter token 46 the two byte target active memory compression factor in binary format in the range (0x0064 -- 0x03E8) (equivalent to 100 -- 1000 decimal). R1--3. For the Active Memory Compression option with the System Parameters option: If the Active Memory Compression option is disabled for the system/partition, the platform must provide in response to the ibm,get-system-parameter for parameter token 46 the two byte value 0x0000. R1--4. For the Active Memory Compression option with the System Parameters option: For the target active memory compression factor system parameter, the ibm,set-system-parameter RTAS call must always return a Status of -9002 (Setting not allowed/authorized).

Performance Boost Modes Vector A variety of platform dependent configuration modes might result in a boost in platform computational capacity. The ibm,get-system-parameter through the performance boost modes vector system parameter communicates to the client program which of these modes are available on the specific platform, which of these modes the client program may enable/disable, and which ones are active. The performance boost mode vectors are 32 bytes (256 bits) long. Each bit position within the performance boost mode vector corresponds to a specific function as specified in . The first defined boost mode is assigned to the highest order bit position. As new boost modes are defined, they are assigned to sequential lower order vector bit positions. Given that the second version of the vector from the ibm,get-system-parameter RTAS call (specifying which modes may be enabled/disabled by the client program) is non-zero, the platform supports calling the ibm,set-system-parameter RTAS call specifying the performance boost modes vector token. The ibm,set-system-parameter RTAS call specifying the performance boost modes vector token takes a single vector as input. Performance Boost Modes Vector Bits Definitions Bit Position (1 based ordinal) Definition 1 Extended Cache Option 2-- 256 Reserved

R1--1. For the Performance Boost Modes option: The platform must implement the System Parameters option. R1--2. For the Performance Boost Modes option: The 96 byte report returned by ibm,get-system-parameter for parameter token 47 must consist of three 32 byte bit vectors as defined by . R1--3. For the Performance Boost Modes option: The first 32 byte bit vector returned by ibm,get-system-parameter for parameter token 47 must contain 1s in the bit positions define by for the performance boost modes that are both supported by the platform and authorized for the caller (by means outside of the scope of LoPAR). R1--4. For the Performance Boost Modes option: The second 32 byte bit vector returned by ibm,get-system-parameter for parameter token 47 must contain 1s in the bit positions define by for the performance boost modes that are both represented in the first vector and may be enabled/disabled by the caller through the ibm,set-system-parameter using parameter token 47. R1--5. For the Performance Boost Modes option: The third 32 byte bit vector returned by ibm,get-system-parameter for parameter token 47 must contain 1s in the bit positions define by for the performance boost modes that are both represented in the first vector and are enabled either by default or by the caller through the ibm,set-system-parameter using parameter token 47. R1--6. For the Performance Boost Modes option: If the ibm,get-system-parameter for parameter token 47 communicated that the client program has the ability to enable/disable one or more of the boost modes, then the platform must support the performance boost modes vector token for ibm,set-system-parameter. R1--7. For the Performance Boost Modes option: If no boost modes can be enabled/disabled then a call to ibm,set-system-parameter specifying the boost modes vector token must return either: “System parameter not supported” as indeed the implementation need not code support for the token if no mode setting is supported. “Setting not allowed/authorized” if the implementation supports setting boost modes but the caller is not authorized to do so. R1--8. For the Performance Boost Modes option: If any input vector to the ibm,set-system-parameter RTAS for parameter token 47 is a one and does not correspond to a bit that is a one in the second version of the vector returned by the ibm,get-system-parameter RTAS for parameter token 47 the ibm,set-system-parameter RTAS must return parameter error. R1--9. For the Performance Boost Modes option: If the corresponding bit that was a one in the second version of the vector returned by the ibm,get-system-parameter RTAS for parameter token 47 is a one in the input vector to the ibm,set-system-parameter RTAS for parameter token 47 then upon successful return that corresponding boost mode must be enabled. R1--10. For the Performance Boost Modes option: If the corresponding bit that was a one in the second version of the vector returned by the ibm,get-system-parameter RTAS for parameter token 47 is a zero in the input vector to the ibm,set-system-parameter RTAS for parameter token 47 then upon successful return that corresponding boost mode must be disabled. R1--11. For the Performance Boost Modes option: To properly awake from partition suspension and handle dynamic reconfiguration, the client program must be prepared to handle changes in the bit settings within the bit vectors reported by the ibm,get-system-parameter RTAS for parameter token 47. R1--12. For the Performance Boost Modes option: Since it is expected that bit positions define by will expand over time, to avoid firmware level compatibility issues, the client program must ignore bit settings within the bit vectors reported by the ibm,get-system-parameter RTAS for parameter token 47 beyond those defined when the client pr gram was designed.

TLB Block Invalidate Characteristics The Block Invalidate option allows for the removal of multiple page table entries with a single platform wide TLB invalidate sequence, providing significantly improved performance when removing a virtual memory object. The size of the block (the number of consecutive virtual memory pages) that is processed by a single TLB invalidate sequence is implementation dependent. This block size might also be dependent upon the page sizes of the TLB entries. This block size represents the upper bound of the number of pages that may be processed in a single operation as for example a single call to H_BLOCK_REMOVE. This system parameter provides the client code the characteristics of the implementation’s TLB invalidate operations. The TLB Invalidate Characteristics return string is a variable length series of bytes which contains one or more TLB Block Invalidate Specifiers as defined in Table 108‚ “TLB Block Invalidate Characteristics Specifier Format‚” on page 253. If the implementation invalidates different sized blocks for different page size encodings, there will be multiple “TLB Block Invalidate Characteristics Specifiers” within the returned string. TLB Block Invalidate Characteristics Specifier Format Byte Offset Bit Number in Byte Description 0 0 - 7 LOG base 2 of the TLB invalidate block size being specified 1 0 - 7 Number of page sizes (N) that are supported for the specified TLB invalidate block size 2 - (N+1) 0 PTE “L” bit: 0 = 4K page in a segment who’s base page size is 4K 1 = page size and segment base size per bits 2 - 7 1 Reserved 2 - 7 Encoded segment base page size and actual page size per Book IVa

R1--1. For the Block Invalidate option with the System Parameters option: For the Block Invalidate system parameter, the ibm,get-system-parameter RTAS call must never return a Status of -9002 (Not Authorized). R1--2. For the Block Invalidate option with the System Parameters option: If the Block Invalidate option is enabled for the partition, the platform must provide in response to the ibm,get-system-parameter for parameter token 50 the one or more TLB Block Invalidate Specifiers for the calling partition as described in . R1--3. For the Block Invalidate option with the System Parameters option: If the Block Invalidate option is disabled for the system/partition, the platform must provide in response to the ibm,get-system-parameter for parameter token 50 the two byte value 0x0000. R1--4. For the Block Invalidate option with the System Parameters option: For the Block Invalidate system parameter, the ibm,get-system-parameterRTAS call must always return a Status of -9002 (Setting not allowed/authorized).

Energy Management Tuning Parameters (EMTP) The energy management tuning parameters are reported. Each parameter occupies its own 8 byte self-defining entry. As many energy management tuning parameter entries as are supported by the system are reported, subject to the limitation of the buffer length. Each reported parameter entry is formatted per . Format of the Energy Management Tuning Parameter Entry Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 Byte 7 Parameter IdentifierSee for definition values. Parameter UnitsSee for definition values. CurrentParameter Value MinimumParameter Value MaximumParameter Value

Definition of the Energy Management Tuning Parameters Parameter ID Definition 0x01 Utilization threshold for increasing frequency 0x02 Utilization threshold for decreasing frequency 0x03 Number of samples for computing utilization statistics 0x04 Step size for going up in frequency 0x05 Step size for going down in frequency 0x06 Delta percentage for determining active cores 0x07 Utilization threshold to determine active cores with slack 0x08 Enable/Disable frequency delta between cores 0x09 Maximum frequency delta between cores 0x50 Idle Power Saver enabled/disabled 0x51 Delay time to enter Idle Power Saver 0x52 Utilization threshold to enter Idle Power Saver 0x53 Delay time to exit Idle Power Saver 0x54 Utilization threshold to exit Idle Power Saver All other Parameter ID Values are reserved, should calling software encounter a parameter id value which was reserved at the time it was written, it shall ignore the specific entry, and only that entry.

Definition of the Energy Management Parameter Unit Values Parameter Units Definition 0x00 Parameter can only be either 1 (enabled) or 0 (disabled) 0x01 Parameter is time in seconds i.e. 10 = 10 seconds 0x02 Parameter is a percentage i.e. 10 = 10% 0x03 Parameter is in 10ths of a percent i.e. 15 = 1.5% 0x04 Parameter is an integer All other Parameter Unit Values are reserved, should calling software encounter a parameter unit value which was reserved at the time it was written, it shall ignore the specific entry, and only that entry.

R1--1. For the EMTP option with the System Parameters option: For the EMTP system parameter, the ibm,get-system-parameter RTAS call must never return a Status of -9002 (Not Authorized). R1--2. For the EMTP option with the System Parameters option: If the EMTP option is enabled for the partition, the platform must provide in response to the ibm,get-system-parameter for parameter token 52 the Energy Management Tuning Parameters for the calling system as described in this section. R1--3. For the EMTP option with the System Parameters option: If the EMTP option is disabled for the system/partition, the platform must provide in response to the ibm,get-system-parameter for parameter token 52 the two byte value 0x0000. R1--4. For the EMTP option with the System Parameters option: For the EMTP system parameter, the ibm,set-system-parameter RTAS call must always return a Status of -9002 (Setting not allowed/authorized).