This adds a page-walk cache (PWC) which stores PDEs from the page
tables at the 2MB, 1GB and 512GB levels, provided they point to tables
with 512 entries and map addresses below 4PB. The PWC also stores
PTEs for 2MB large pages. It uses a 512 x 64b block RAM structured as
64 sets, each set using 8 words of RAM and storing 4 ways. The valid
bit, page size, leaf indication (PTE vs. PDE), and PID for all 4 ways
are stored in the first 64b word so that invalidate-all and
invalidate-by-PID can be done in 64 cycles.
The MMU test (tests/mmu/mmu.c) is modified to use a three-level tree
mapping a total of 512GB, where the 1G and 2M levels can be cached in
the PWC.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds a 256-entry, 4-way set associative TLB which stores mappings
from virtual addresses composed of a PID and EA to 4kB real page
addresses. Because each entry is tagged with a PID value, there is no
need to flush it when the PIDR register is changed, unlike the L1 TLBs
in the icache and dcache. This should improve performance when
context switches between processes are frequent. EAs are assumed to
lie in a 4PB range (52-bit) as both of the architected page table
formats used by Linux map a 4PB space per PID (ignoring quadrant
bits).
A 512 x 64-bit block RAM is used to store both tags and data (PTE
values) for the TLB. The RAM is divided into 64 groups of 8 words
(each word being 64 bits), giving 128 bits per entry. In order to
speed up flush-by-PID operations, the valid bits, PID tags and two
address tag bits are stored in a single 64-bit word (word 0 of the
block). Flush-by-PID operations read word 0 of each block and write
back zeroes to the entries which match the PID being flushed. Flush
all operations just writes zeroes to word 0 of every block.
A pseudo-LRU array implemented in a separate 64 x 3-bit RAM is used to
determine a victim entry to be evicted when a new entry is to be
written and all four entries in the set are valid.
The sets are indexed using a 6-bit hash of some of the EA and PID
bits.
Now that tlbie is actually using the PID argument in RS, we need to
make sure that the code in tests/mmu/mmu.c sets it correctly.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This tests for the bug where a load near the end of a page, if the load
faults and the following page isn't mapped, could cause a DSI followed
incorrectly by an ISI shortly afterwards.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Implementations without hypervisor/LPAR support are permitted by the
architecture, but should have MSR[HV] forced to be 1 at all times, not
0, and should implement various instructions and registers that are
only accessible in hypervisor mode.
This commit implements MSR[HV] as a constant 1 bit and adds the hrfid
instruction, which behaves exactly the same as rfid except that it
reads HSRR0/1 instead of SRR0/1. We already have HSRR0/1 and HSPRG0/1
implemented.
When HV=1, Linux expects external interrupts to arrive as hypervisor
interrupts, so this adds support for hypervisor interrupts (i.e.,
those that set HSRR0/1) and makes the external interrupt be a
hypervisor interrupt. (If we had an LPCR register, the LPES bit would
control this, but we don't.) The xics test is updated to read HSRR0/1
after an external interrupt.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This implements a 1-entry partition table, so that instead of getting
the process table base address from the PRTBL SPR, the MMU now reads
the doubleword pointed to by the PTCR register plus 8 to get the
process table base address. The partition table entry is cached.
Having the PTCR and the vestigial partition table reduces the amount
of software change required in Linux for Microwatt support.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The tests were using MSR values that did not have MSR_SF or MSR_LE
set. Fix this so that the test still works when 32-bit and BE modes
are implemented.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds the PID register and repurposes SPR 720 as the PRTBL
register, which points to the base of the process table. There
doesn't seem to be any point to implementing the partition table given
that we don't have hypervisor mode.
The MMU caches entry 0 of the process table internally (in pgtbl3)
plus the entry indexed by the value in the PID register (pgtbl0).
Both caches are invalidated by a tlbie[l] with RIC=2 or by a move to
PRTBL. The pgtbl0 cache is invalidated by a move to PID. The dTLB
and iTLB are cleared by a move to either PRTBL or PID.
Which of the two page table root pointers is used (pgtbl0 or pgtbl3)
depends on the MSB of the address being translated. Since the segment
checking ensures that address(63) = address(62), this is sufficient to
map quadrants 0 and 3.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds tests of instruction translation to the mmu test.
This also clears the BSS and improves the linker script.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds tests to check that the MMU and dTLB are translating
addresses and checking permissions correctly.
We use a simple 2-level radix tree. The radix tree maps 2GB of
address space and has a 1024-entry page directory pointing to
512-entry page table pages.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>