microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	6d35ddb246	Merge pull request #466 from paulusmack/master Implement a level-2 TLB and a page walk cache in the MMU	2 weeks ago
Paul Mackerras	4892939524	ECPIX-5: Add a GPIO controller and connect i2c RTC chip on PMOD 3 (#467 ) The i2c bus uses GPIOs 22 and 23, as on the Arty board. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 weeks ago
Paul Mackerras	73e6130e90	tests/mmu: Add a test to verify that tlbie by PID works Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 weeks ago
Paul Mackerras	9c66ab9153	MMU: Implement a page-walk cache This adds a page-walk cache (PWC) which stores PDEs from the page tables at the 2MB, 1GB and 512GB levels, provided they point to tables with 512 entries and map addresses below 4PB. The PWC also stores PTEs for 2MB large pages. It uses a 512 x 64b block RAM structured as 64 sets, each set using 8 words of RAM and storing 4 ways. The valid bit, page size, leaf indication (PTE vs. PDE), and PID for all 4 ways are stored in the first 64b word so that invalidate-all and invalidate-by-PID can be done in 64 cycles. The MMU test (tests/mmu/mmu.c) is modified to use a three-level tree mapping a total of 512GB, where the 1G and 2M levels can be cached in the PWC. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 weeks ago
Paul Mackerras	11d299e161	MMU: Add a TLB to store 4kB page translations This adds a 256-entry, 4-way set associative TLB which stores mappings from virtual addresses composed of a PID and EA to 4kB real page addresses. Because each entry is tagged with a PID value, there is no need to flush it when the PIDR register is changed, unlike the L1 TLBs in the icache and dcache. This should improve performance when context switches between processes are frequent. EAs are assumed to lie in a 4PB range (52-bit) as both of the architected page table formats used by Linux map a 4PB space per PID (ignoring quadrant bits). A 512 x 64-bit block RAM is used to store both tags and data (PTE values) for the TLB. The RAM is divided into 64 groups of 8 words (each word being 64 bits), giving 128 bits per entry. In order to speed up flush-by-PID operations, the valid bits, PID tags and two address tag bits are stored in a single 64-bit word (word 0 of the block). Flush-by-PID operations read word 0 of each block and write back zeroes to the entries which match the PID being flushed. Flush all operations just writes zeroes to word 0 of every block. A pseudo-LRU array implemented in a separate 64 x 3-bit RAM is used to determine a victim entry to be evicted when a new entry is to be written and all four entries in the set are valid. The sets are indexed using a 6-bit hash of some of the EA and PID bits. Now that tlbie is actually using the PID argument in RS, we need to make sure that the code in tests/mmu/mmu.c sets it correctly. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 weeks ago
Paul Mackerras	6081646638	loadstore1: Fix reading of PIDR and PTCR via debug interface (#462 ) Commit `0a11e8455f` ("core: Implement hashst and hashchk instructions", 2025-01-23) expanded the SPR selector used in loadstore1 from 3 to 4 bits because the addition of the hash key SPR took the number of SPRs to be addressed from 8 to 9. In the process, PTCR and PIDR moved from 0/1 to 8/9, but the assignment of sprsel from dbg_spr_addr in the loadstore1_2 process wasn't updated to reflect this. As it happened, the hash key SPRs were subsequently moved into the SPR RAM, reducing the number of SPRs in loadstore1 back to 8. Also, the SPR select bit sent to the MMU never depended on dbg_spr_addr, meaning that reading PTCR and PIDR via the debug interface would have randomly supplied one or the other. To fix this, revert the part of commit `0a11e8455f` which expanded the sprsel fields and variables, reducing them back to 3 bits and restoring PTCR/PIDR to the 0/1 encodings. Also make the SPR read address sent to the MMU come from dbg_spr_addr when we are not executing an mfspr in loadstore1. With this, PTCR and PIDR can be read correctly via the debug interface. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 months ago
Paul Mackerras	efd0571b5f	Merge pull request #461 from paulusmack/master Improvements for the Arty A7 board	3 months ago
Paul Mackerras	81792f599b	arty a7: Connect SD card interface to microSD socket on LCD touchscreen board If the generic USE_LCD is false, the first SD card controller (mmcblk0 in Linux) is connected to pmod HA; if USE_LCD is true, it is connected to the SD card slot on the touchscreen/LCD panel. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	185008c907	Merge pull request #460 from paulusmack/fixes Fix icache and dcache bugs - Fix icache bug causing spurious ISI interrupts - Fix dcache bug causing corrupted load data	3 months ago
Paul Mackerras	c7531e592c	arty a7: Add facilities to get A/D conversions from the touchscreen This adds connections from the A2 - A5 inputs on the Arty A7 to the XADC module in the Artix-7 plus a way for software to access the XADC via its DRP port, and a status register to tell software when conversion sequences are done. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	172eae61cb	arty a7: Add an interface for a TFT LCD touchscreen This adds an interface for an Arduino-compatible LCD touchscreen. The screen module plugs directly on to the Arduino/chipKit shield connector on the Arty A7. Unfortunately, the slightly strange way the resistive touchscreen is brought out (connected to the D0, D1, RS and CS pins) combined with the 200 ohm protection resisters on the Arty board mean that some hardware hacks to the module are necessary. I rewired mine so that D0 and D1 are on the A4 and A5 pins and the reset is where D0 was (shield I/O 8). This interface is suitable for boards with a HX8347 driver chip. The timing may not be quite suitable for other driver chips. The interface is a byte which can be read and written at 0xc8050000, containing an index register, and a 1-8 byte data register at 0xc8050008. Reading at offsets 1 to 7 from those addresses yields the same value as at offset 0. Writing 64 bits to the data register writes the bytes at offset 1, 0, 3, 2, 5, 4, 7, 6 in that order to the driver chip. This allows pixel data to be transferred using 64-bit writes, ending up in the frame buffer in the expected order (for 16-bit pixels, the driver chip expects MS byte then LS byte). 32-bit writes do 1, 0, 3, 2, and 16-bit writes do 1, 0. The touchscreen support so far is a 1-byte register containing bits to set RS, D0, D1 and CS high or low or make them tri-state. There is nothing to do analog conversions of the signal levels at this stage. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	7f4e0185b5	xilinx_mult: Eliminate a Vivado warning Since the p1 instance of DSP48 has CREG = 0, we should ground the CEC input, as mentioned in a Vivado warning. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	d4fec95044	arty a7: Turn on LED 5 when SD card command-done interrupt is enabled This snoops writes to the interrupt enable registers of the SD card interfaces and records whether the command-done interrupt is enabled. LED 5 is turned on whenever either interface has this interrupt enabled in order to serve as a disk activity indicator. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	dcd1072c25	arty a7: Put the top 8 GPIOs on pmod B Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	6100e7b50e	dcache: Fix another dcache bug causing occasional load data corruption Commit `a7420c2a4d` ("dcache: Fix bug causing load to return incorrect data", 2025-12-27) fixed the main cause of the bug, but left a 1-cycle window where the same problem could still occur. If a touch that misses in the dcache is followed immediately by a load to a different cache line with the same index, then because the touch is completed and the new tag is written for the line being touched in the same cycle, it is possible for the following load to use the previous (stale) tag value for the line. If that old value matches the load (i.e., the load would have been a hit in the absence of the touch) then the load will incorrectly return data from the line being touched. Fix this by delaying the completion of the touch until after the new tag has been written, which is indicated by r1.write_tag = 0. Fixes: `a7420c2a4d` ("dcache: Fix bug causing load to return incorrect data", 2025-12-27) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	a1d83ba91a	tests/mmu: Add a test for a faulting load near the end of a page This tests for the bug where a load near the end of a page, if the load faults and the following page isn't mapped, could cause a DSI followed incorrectly by an ISI shortly afterwards. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	41e341f260	icache: Clear fetch failed flag on flush This fixes a bug where a load that results in a DSI, if it is placed near the end of a page and the following page isn't mapped, can result in the core starting to take the DSI but then jumping off to the ISI vector. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	16c3eda1b1	arty a7: Rework status LED colours This frees up LEDs 4 and 5 by combining their status functions into LED 0, which is now black when the system is in reset and yellow when the system clock is not locked. On configuations without litedram, LED 0 now shows green rather than magenta. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	90df07b950	arty a7: Add connection to i2c RTC chip on port JD The I2C data is on GPIO 22 and the clock is on GPIO 23. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	4f06a01731	arty a7: Add a second SD card interface on pmod JC This adds a second SD card interface. The main complexity is in providing a wishbone switch/arbiter to multiplex the two DMA wishbones from the two interfaces to a single wishbone going to the soc module. There is a new syscon info reg bit to indicate the presence of the second litesdcard. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	6366fbb5a7	arty a7: Simplify GPIO connections Currently, GPIO lines 0 - 8 drive three of the 3-colour LEDs on output, but on input read the state of the pins labelled IO10 - IO13, IO26 - IO29 and IO8 on the Arty board. Then GPIO lines 10 - 17 drive IO10 - IO13 and IO26 - IO29 on output, but on input read the 4 buttons and 4 switches. To simplify all this and prepare for future changes, this just detaches IO8, IO13 - IO13 and IO26 - IO29, so now GPIO 0 - 8 read 0 on input, and GPIO 10 - 17 do nothing on output. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	8339795d0c	Merge pull request #459 from paulusmack/fixes Bug fixes for the FPU and dcache	3 months ago
Paul Mackerras	6eaf22ea95	dcache: Fix stalls that occurred occasionally with dcbt followed by ld This fixes a race condition that causes a hang in a situation where the program does a dcbt to a cache line, then hits a TLB miss causing some requests to come in to the dcache from the MMU while the cache line requested by the dcbt has not yet started to come in, then does a load to an address in the same cache line requested by the dcbt. If it happens that the data for the load arrives in the same cycle that the load is doing the cache tag and TLB lookups, the dcache_slow process correctly recognizes that the request can be satisfied immediately but incorrectly sends the done signal to the MMU rather than loadstore1, because the logic looks at r1.mmu_req not req.mmu_req. Fix it to use req.mmu_req. Also make sure that RELOAD_WAIT_ACK state only completes a touch that was the one that caused entry to RELOAD_WAIT_ACK state, not a subsequent touch, which will have r1.req.hit_reload = 0. (A touch to the same line that is already being reloaded would be treated as a hit.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	fdd98d88d4	FPU: Fix zero result detection in fmadd-family instructions With the multiply-add instructions, it is possible to get into state FMADD_6 with R containing a value >= 8.0. If the value is exactly 8.0, the logic will incorrectly conclude that the result is zero because it only tests bits up to UNIT_BIT + 2. Fix this by testing up to UNIT_BIT + 3, and add a test case to the FPU test that triggers this situation. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	d02e8e6f93	Merge pull request #458 from paulusmack/fixes Fixes for bugs found in dcache, loadstore1 and execute1.	4 months ago
Paul Mackerras	84eebf5c7c	execute1: Fix bug causing SRR0 to be set to 4 more than the correct value If an scv (or sc) instruction is executed and an asynchronous interrupt occurs on the following instruction (e.g. the first instruction of the scv handler), the address written to SRR0 will be the address of that following instruction + 4. The reason is that ex1.advance_nia will still be set from the execution of the sc[v]. Fix this by clearing v.advance_nia in execute1_1. (This only shows up for asynchronous interrupts with scv, not sc, because sc clears MSR[EE]. It should show up for synchronous interrupts with both sc and scv, but that has not been demonstrated.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	aadd22267f	execute1: Don't increment the LOG_ADDR SPR after reading it Reading the LOG_DATA SPR is supposed to increment the log address, and reading LOG_ADDR is not supposed to, but currently this is the wrong way around. Fix it. Also add a related comment. Fixes: `8f7326a824` ("core: Implement various SPRs which read zero and ignore writes", 2025-04-10) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	a7420c2a4d	dcache: Fix bug causing load to return incorrect data If a touch is immediately followed by a load to a different address which has the same index as the touch address, and both are cache misses, it is possible for the load to be treated as if it is to the same cache line as the touch, and thus return data from the line being touched rather than the line being loaded from. For example, if the touch is to 0x1c20 and the load is to 0x2c20, and the state left in r1.store_ways by an earlier operation happens to match the PLRU victim way, the load will return data from 0x1c20. This happens because the touch completes immediately, meaning that the load gets processed before r1.store_ways and the cache tag for the line being touched have been set correctly, leading to a chance that the load can match when it shouldn't (or not match when it should). To fix this, complete the touch after one cycle, in RELOAD_WAIT_ACK state, rather than immediately. Also, for touches, consider hit_reload = 1 equivalent to a cache hit. If the line is being reloaded then the touch doesn't need to do anything. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	c78d9b32ef	loadstore1: Ensure tlbie instructions get completed Since commit `c938246cc8` ("dcache: Simplify addressing of the dcache TLB", 2025-04-05), tlbie instructions have been sent down the loadstore pipe with both req.dc_req and req.mmu_op set, so that the tlbie gets sent both to the data cache and the MMU. This is so that the relevant TLB hit signals are set correctly in the dcache for a single-page invalidation. However, this means that loadstore1 was not sending a completion to writeback for the tlbie. Normally this doesn't cause a problem, but if the tlbie is followed by an instruction that is marked 'single-pipe' in the decode1 tables, such as sync (any variant), decode2 will then stall forever waiting for the tlbie to complete before issuing the following instruction. To fix this, clear req.dc_req in the second loadstore stage for a tlbie (actually for any MMU operation, but tlbie is the only instruction that would have dc_req set). Fixes: `c938246cc8` ("dcache: Simplify addressing of the dcache TLB", 2025-04-05) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	f9dc3ecdc8	execute1: Correct FSCR[IC] value for prefix unavailable interrupt FSCR[IC] should be set to 13 for a prefix unavailable interrupt, not 11. To avoid this type of mistake, use the same symbols for setting IC as for the bit numbers in the rest of FSCR. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 months ago
Paul Mackerras	a1624a50da	Merge pull request #457 from paulusmack/fixes FPU fixes, mostly for bugs found by comparing results from random instruction sequences (generated by simple_random) with POWER9.	4 months ago
Paul Mackerras	09b340e845	FPU: Update committed FPSCR value correctly The committed FPSCR is updated in the cycle where an FPU instruction signals completion. Since we update the FPRF field in the FPSCR in that same cycle, the value put into r.comm_fpscr needs to include the new FPRF value. Otherwise, a subsequent flush (for example, due to the following instruction being an illegal instruction that has to be emulated) will drop the FPSCR update. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	1ad8848655	FPU: Improve zero result detection and simplify final states This improves detection of results that are exactly zero in FINISH state by noting that on entry to FINISH state, if R is zero then X must also be zero, so no rounding needs to be done and no underflow exists. Therefore we can set rcls_op = RCLS_TZERO to test for zero and exit early if R = 0. The RCLS_TZERO test now tests the whole of R just in case. The rest of the following states have been streamlined and simplified. In cases of underflow, we only need to take action before rounding in the UE=0 case (disabled underflow exception), where we need to denormalize before rounding. For enabled underflow cases we just use the existing NORMALIZE state, which lets us remove NORM_UFLOW state. On entry to ROUNDING state, R can be zero or denorm only for round to integer instructions (fri) or for disabled underflow exception cases. Note that in case of underflow with UE=0, the exception is only actually signalled if there is loss of accuracy, i.e. if FPSCR[FI] will be set. This is now done at the end of ROUNDING state. For underflow with UE=1, we go to a new ROUND_UFLOW_EN state to adjust the exponent from ROUNDING, ROUNDING_2 or ROUNDING_3 state. In the ROUNDING states, we avoid shifting left to normalize a result with exponent <= -1022, because if we did we would then just need to denormalize again. This lets us get rid of DENORM state. Finally, noticing that DO_FRSP_2 state does much the same as FINISH state lets us remove DO_FRSP_2 state and go to FINISH state from DO_FRSP. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	f8a11420ca	FPU: Check for rounding overflow in 32-bit convert-to-integer operations Without this, rounding a value of 0xFFFFFFFF up, giving 0x100000000, will yield an incorrect result of zero. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	6fe4b549f5	FPU: Improve accuracy in multiply-add almost-cancellation cases There are two paths for multiply-add instructions; one where the product is larger or nearly the same as the addend, which does the addition/subtraction in the multiplier with 128-bit accuracy; the other is used when the addend is clearly larger, which shifts the product right before doing the addition/subtraction in 64-bit arithmetic. The threshold for the second path is that B_exp has to be greater than A_exp + C_exp + 1, the +1 being because the product mantissa can be greater than 2. This increases the +1 to +2 to make sure that the 128-bit path is used when there is any chance of cancellation of the high-order bits of the sum. With the +1 threshold we could still get close to cancellation when the mantissas of A and C were nearly 2 and the mantissa of B was 1. This improves accuracy and avoids the need to do a 120-bit subtraction in the second path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	80c81b58ef	FPU: Generate correct result sign when B is denormal If a subtraction A - B is done where A is in normalized form with an exponent of -1022, and B is denormal, an inconsistency arises between the comparison of the raw exponents in the first cycle, which sees A.exp (0x001) > B.exp (0x000), and the comparison in DO_FADD state, which sees r.a.exponent (-1022) = r.b.exponent (-1022). Conseqently we get r.add_bsmall = 0 and the subtraction is done the wrong way around, yielding the wrong sign for the result. Fix this by setting r.add_bsmall according to the comparison of raw exponents in the first cycle and then using it in DO_FADD state. Also add a test case for this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	f631dcd700	FPU: Set FPRF correctly on multiply result that underflows rcls_op being set to RCLS_TZERO was not detecting a zero result after rounding for a multiply result that underflows, because S still had low bits of the product. To fix this, remove the 's_nz = 0' from the RCLS_TZERO test. We can't then use this test in the FMADD_6 state, but we really shouldn't be testing for zero there, before rounding, so remove that. Also simplify FMADD_6 state by not setting rs_norm and going always to FINISH state rather than going to NORMALIZE state. Add a test for this case (actually a fmadd with B=0). While here, remove a pointless assignment to f_to_multiply.valid in MULT_1 state, since r.first is never set here. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	b122577a4e	FPU: Be more careful about preserving low-order bits in multiply-add instrs Add code to check whether bits of S which don't get shifted into R are non-zero, and set X if they are, so that rounding in multiply-add instructions works correctly. This needs to be done after normalization in the case of very small results, where potentially all the non-zero bits in S do get shifted into R. Also fix an incorrect test case, and add another multiply-add test case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	59992eab90	FPU: Avoid doing overflow processing twice in OE=1 case Split the ROUND_OFLOW state into two, one which handles the OE=0 case (disabled overflow exception) and one which handles the OE=1 case (enabled overflow exception). This avoids a loop in the state diagram and prevents us from adding the exponent bias twice. Also correct a bug in ROUNDING_3 state where for single-precision operations which yield a result which is denormal in double-precision format, r.shift was set wrongly. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	9f27f60b26	FPU: Clear FPSCR[FR,FI] on overflow in convert-to-integer instructions Also simplify INT_CHECK state by going to INT_OFLOW on overflow. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	37edba4da7	FPU: Normalize B operand for multiply-add instructions Otherwise the result can get rounded incorrectly when B is denorm but the A * C product is much smaller. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	d33f31509b	FPU: Clear S in ADD_SHIFT state Otherwise, if this is a multiply-add instruction and the result needs to be shifted left, bits of the product in S will contaminate the final result. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	b8f7cbd894	FPU: Record bits shifted out of addend in fmadd-family instructions If the addend is smaller than the product and thus needs to be shifted right, record if any bits are lost from the right end in r.x, so that the result gets rounded correctly. Also add a test that checks one such case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	009ee1c9c5	FPU: Renormalize frsp operand if denormalized This arranges for the frsp operand to be renormalized if necessary. Without this, we can incorrectly get X set to 1 for denormalized operands, and hence the rounding may be done incorrectly. To make things clearer, we now have an explicit flag indicating when the B operand needs to be in normalized form. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	baf8f5f8c6	FPU: Force reserved FPSCR bit 11 to zero This ensures that the reserved FPSCR bit can never be set, by clearing it at the end of the fpu_1 process. Also remove a redundant setting of cr_result in the mcrfs code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	a18c462b27	FPU: Ignore stale P contents in short-circuit multiply-add When a multiply-add is done with A or C equal to zero, the actual multiplication operation is not done, hence P is not valid, so in FINISH state we shouldn't set X based on P being non-zero. Fix this by clearing the is_multiply flag in the short-circuit case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	41988e3b5f	FPU: Fix comparison of remainder in square root code The square root procedure needs to compare B - R^2 with 2R + 1 to decide whether to increment the square root estimate R by 1. It currently does this by putting 2R + 1 in B and using the pcmpb_lt and pcmpb_eq signals. This is not correct because the comparisons that generate those signals have a 2-bit shift embedded into them. Instead, put 2R + 1 into C and use pcmpc_lt/eq, which don't have the 2-bit shift. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	f3b9566ae2	FPU: Round to single precision for fcfid[u]s The fcfids and fcfidus instructions weren't rounding to single precision because r.longmask wasn't getting set. To fix this, set v.longmask to e_in.single for the fcfid* instructions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	e5651e2eab	FPU: Avoid adding bias twice in UE=1 underflow case In case of underflow with UE=1, ROUND_UFLOW state adds the exponent bias and then goes to NORMALIZE state if the value is not normalized. Then NORMALIZE state will go back to ROUND_UFLOW if the exponent is still tiny, resulting in the bias getting added twice. To avoid this, if ROUND_UFLOW needs to do normalization, it goes to a new NORM_UFLOW state which does the normalization and goes to ROUNDING state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	a0755935f4	FPU: Normalize B for fmadd family instructions If B is denormalized, but the A*C product is much smaller, then the result is B; in the UE=1 case we need to normalize the result, and the left shift to do that can bring in low-order product bits from S and corrupt the result. To avoid this, make sure B is normalized. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago

1 2 3 4 5 ...

1547 Commits (master) All Branches Search

1547 Commits (master)

All Branches