microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	c938246cc8	dcache: Simplify addressing of the dcache TLB Instead of having TLB invalidation and TLB load requests come through the dcache main path, these operations are now done in one cycle entirely based on signals from the MMU, and don't involve the TLB read path or the dcache state machine at all. So that we know which way of the TLB to affect for invalidations, loadstore1 now sends down a "TLB probe" operation for tlbie instructions which goes through the dcache pipeline and sets the r1.tlb_hit_* fields which are used in the subsequent invalidation operation from the MMU (if it is a single-page invalidation). TLB load operations write to the way identified by r1.victim_way, which was set on the TLB miss that triggered the TLB reload. Since we are writing just one way of the TLB tags now, rather than writing all ways with one way's value changed, we now pad each way to a multiple of 8 bits so that byte write-enables can be used to select which way gets written. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	5168242cd5	dcache: Rework forwarding data paths This rearranges the multiplexing of cache read data with forwarded store data with the aim of shortening the path from the req_hit_ways signal to the r1.data_out register. The forwarding decisions are now made for each way independently and the the results then combined according to which way detected a cache hit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	4278387b21	dcache: Simplify reservation logic With some slight arrangement of the state machine in the dcache_slow process, we can remove one of the two comparators that detect writes by other entities to the reservation granule. The state machine now sets the wishbone cyc signal on the transition from IDLE to DO_STCX state. Once we see the wishbone stall signal at 0, we consider we have the wishbone and we can assert stb to do the write provided that the stcx is to the reservation address and we haven't seen another write to the reservation granule. We keep the comparator that compares the snoop address delayed by one cycle, in order to make timing easier, and the one (or more) cycle delay between cyc and stb covers that one cycle delay in the kill_rsrv signal. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	26507450b7	dcache: Remove reset on read port of cache tag RAM The reset was added originally to reduce metavalue warnings in simulation, is not necessary for correct operation, and showed up as a critical path in synthesis for the Xilinx Artix-7. Remove it when doing synthesis; for simulation we set the value read to X rather than 0 in order to catch any use of the previously reset value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	9645ab6e1f	dcache: Rework forwarding and same-page logic This gets rid of some largish comparators in the dcache_request process by matching index and way that hit in the cache tags instead of comparing tag values. That is, some tag comparisons can be replaced by seeing if both tags hit in the same cache way. When reloading a cache line, we now set it valid at the beginning of the reload, so that we get hits to compare. While the reload is still occurring, accesses to doublewords that haven't yet been read are indicated with req_is_hit = 0 and req_hit_reload = 1 (i.e. are considered to be a miss, at least for now). For the comparison of whether a subsequent access is to the same page as stores already being performed, in virtual mode (TLB being used) we now compare the way and index of the hit in the TLB, and in real mode we compare the effective address. If any new entry has been loaded into the TLB since the access we're comparing against, then it is considered to be a different page. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	2529bb66ad	dcache: Implement dcbz to non-cacheable memory properly A dcbz operation to memory that is mapped as non-cacheable in the page tables doesn't cause an alignment interrupt, but neither was it implemented properly in the dcache. It does do 8 writes to memory but it also creates a zero-filled line in the cache. This fixes it so that dcbz to memory mapped non-cacheable doesn't write the cache tag or set any line valid. We now have r1.reloading which is 1 only in RELOAD_WAIT_ACK state, but only if the memory is cacheable and therefore the cache should be updated (i.e. it is zero in RELOAD_WAIT_ACK state if we are doing a non-cacheable dcbz). We can now also remove the code in loadstore1 that checks for non-cacheable dcbz, which only triggered when doing dcbz in real mode to an address in the Cxxxxxxx range. Also remove some unused variables and signals. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	ec323897e3	dcache: Use expanded per-way TLB and cache tag hit information Rather than combining the results of the per-way comparators into an encoded 'hit_way' variable, use the individual results directly using AND-OR type networks where possible, in order to reduce utilization and improve timing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 months ago
Paul Mackerras	c5abe3c0a9	Merge pull request #440 from paulusmack/compliance More compliance improvements - [H]DEXCR, no-op SPRs, writable TB	4 months ago
Paul Mackerras	413907e4bc	soc: Move timebase back into the core and enable writing to it Instead of a single global timebase register in the SoC, we now have a timebase counter in each core; however, now they are only reset by the soc reset, not the core reset. Thus they stay in sync even when some cores are disabled (via the syscon cpu_ctrl register). This implements mtspr to the TBLW and TBUW SPRs, which write the lower and upper 32 bits of this core's timebase, respectively. In order to fulfil the ISA's requirements that (a) some method for getting the timebases into sync and (b) some method for preventing userspace from reading the timebase be provided by the platform, this adds a syscon register TB_CTRL with two read/write bits implemented; bit 0 freezes all the timebases in the system when set, and bit 1 makes reading the timebase privileged (in all cores). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	f705fc5e19	core: Implement reserved/no-op SPR numbers SPR numbers 808 - 811 do nothing when read or written, that is, mfspr doesn't modify the destination register. This is accomplished in the same way that privileged mfspr to an unimplemented SPR is made a no-op, by supplying the old contents of the destination register as an input and writing that same value back. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	c49c32b5fe	core: Implement DEXCR and HDEXCR registers Of the defined aspect bits (which are all read-write), only the NPHIE and PHIE bits have any function at all, since Microwatt is an in-order single-issue machine and never does any branch speculation. Also, since there is no privileged non-hypervisor mode, the high 32 bits of DEXCR do nothing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	bae24b12e7	Merge pull request #439 from paulusmack/master Update LiteX code for ethernet, SD card and DRAM	5 months ago
Paul Mackerras	3e0888ae35	litesdcard: Update generated code Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	3fb0a9ed26	litedram: Update generated code Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	ab7105f438	liteeth: Update generated code Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	370dbef593	Merge pull request #438 from paulusmack/master Improve timing and utilization, remove warnings	5 months ago
Paul Mackerras	f0c331b8b8	Arty A7: Reduce warnings from Vivado Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	1395bde3cc	core: Store hash key SPRs in the SPR RAM This moves HASHKEYR and HASHPKEYR to the SPR RAM that also stores things such as SRR0/1, LR and CTR. For hashst[p] and hashchk[p] instructions, execute1 reads the relevant key register from the RAM and sends it to loadstore1. This saves several LUTs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	2c7d1e5d9c	decode: Split input B selection into two fields Instead of a single input_reg_b_t field in the decode table which select both whether input B is a register or constant, and also which constant (immediate value) to use, we now have one field which selects whether input B is immediate (constant), a GPR, or an FPR, and a separate field to select which sort of immediate value to use. This results in simpler logic and better timing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	e4e1a033bd	Merge pull request #437 from paulusmack/compliance Implement fixed-point hash instructions	5 months ago
Paul Mackerras	8f537c13bc	tests: Add a test for the hash instructions hash{st,cmp}[p] Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	3bcc31fdda	core: Implement hashstp and hashchkp instructions and HASHPKEYR register These provide facilities similar to hashstp, hashchk and HASHKEYR, but restricted to privileged mode. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	00a3db8457	decode1: Indicate instruction privilege in main decode table Previously the computation of whether an instruction is privileged or not was done based on the insn_type. However, that meant that lcix (OP_LOAD) and stcix (OP_STORE) couldn't be made privileged, and neither could tlbsync (OP_NOP). Instead, this adds a field to the main instruction decode table to indicate privileged instructions, and makes the cache-inhibited loads and stores privileged, along with tlbsync. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	0a11e8455f	core: Implement hashst and hashchk instructions These are done in loadstore1. The HashDigest function is computed in 9 cycles; for 8 cycles, a state machine does 4 steps of key expansion per cycle, and for each of 4 lanes of data, does 4 steps of ciphering; then there is 1 cycle to combine the results into the final hash value. At present, hashcmp does not overlap the computation of the hash with fetching of data from memory (in the case of a cache miss). The 'is_signed' field in the instruction decode table is used to distinguish hashst and hashcmp from ordinary loads and stores. We have a new 'RBC' value for input_reg_c_t which says that we are reading RB but we want the value to come in via the C port; this is because we want the 5-bit immediate offset on the B port. Note that in the list of insn_code values, hashst/chk have been put in the section for instructions with an RB operand, which is not strictly correct given that the B port is used for the immediate D operand; however, adding them to the section for instructions without an RB operand would have made that section exceed 128 entries, causing changes to the padding needed. The only downside to having hashst/cmp where they are is that the debug logic can't use the RB port to read GPR/FPRs when a hashst/cmp instruction is being decoded. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 months ago
Paul Mackerras	e9b57ca5bf	Merge pull request #436 from paulusmack/smp Implement SMP	5 months ago
Paul Mackerras	0a2d3b6f58	loadstore1: Split DAWR check across a clock edge Instead of doing the address subtractions and subsequent logic for DAWR hit detection in the second cycle of a load or store, this does the subtractions in the first cycle and the remaining logic in the second cycle. This improves timing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	d8423568b6	core: Evaluate rotator control signals in decode2 Hopefully this improves timing a bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	d1c7b654bb	wishbone_arbiter: Remove early_sel optimization when > 4 masters For the sake of overall timing in larger SoCs, remove the early_sel optimization when there are more than 4 masters. Also make the ack and stall signals to a particular master depend on that master's cyc, not on the busy signal, which can depend on any master's cyc. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	bf55efec6d	Arty A7: Add an option to select the number of CPU cores Timing is currently not very good with 2 cores on the Arty A7-100. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	9bd6b3d175	xics: Implement destination server field in interrupt source registers This implements the server field in the XISRs (external interrupt source registers), allowing each interrupt source to be directed to a particular CPU. If the CPU number that is written is out of range, CPU 0 is used. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	3924ed0f49	xics: Implement a presentation controller per CPU core This is mainly in order to get IPIs. All external interrupts still go to CPU 0 for now. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	49fcbaa5b2	soc: Implement a global timebase across all cores Now all cores see the same timebase value at any given instant. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	e0c5af9bb1	mw_debug: Add -c flag to select which CPU core to address Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	9a06b0c182	soc: Implement multiple CPU cores This adds an 'NCPUS' generic parameter to the soc module, which then includes that many CPU cores. The cores have separate addresses on the DMI interconnect, meaning that external JTAG debug tools can view and control the state of each core individually. The syscon module has a new 'cpu_ctrl' register, where byte 0 contains individual enable bits for each core, and byte 1 indicates the number of cores. If a core's enable bit is clear, the core is held in reset. On system reset, the enable byte is set to 0x01, so only core 0 is active. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	0020c13226	Merge pull request #435 from paulusmack/compliance Improve architecture compliance of debug facilities	6 months ago
Paul Mackerras	23ff954059	core: Change bperm to a simpler and slower implementation This does bperm in the bitsort unit instead of the logical unit, and no longer tries to do it in a single cycle with eight 64-to-1 multiplexers. Instead it is now a state machine in the bitsort unit, takes 8 cycles, and only has one 64-to-1 multiplexer. This helps improve timing and reduces LUT usage. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	f6a839a86b	control: Use a 1-hot encoding for bypass enables Instead of creating a 2-bit encoded bypass selector, we now have a 4-bit encoding where bits 1 to 3 enable separate bypass sources, and bit 0 indicates if any bypass should be used. This results in slightly simpler logic and better timing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	52d8f28d03	execute1: Improve timing for execute bypass tag The tags for the bypass data paths back to decode2 don't really need to depend on the stall/busy inputs or on whether an exception might be generated, since the bypass values won't be used until the instruction gets executed. Therefore, this simplifies the expressions for bypass_data.tag.valid and bypass_cr_data.tag.valid. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	80bc9d5098	tests/trace: Add a few tests of DAWR (data watchpoint) functionality Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	5ddd8884fa	core: Implement two data watchpoints This implements the DAWR0, DAWRX0, DAWR1, and DAWRX1 registers, which provide the ability to set watchpoints on two ranges of data addresses and take an interrupt when an access is made to either range. The address comparisons are done in loadstore1 in the second cycle (doing it in the first cycle turned out to have poor timing). If a match is detected, a signal is sent to the dcache which causes the access to fail and generate an error signal back to loadstore1, in much the same way that a protection violation would, whereupon a data storage interrupt is generated. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	09de0738de	tests/trace: Add checks for SIAR and SDAR being set correctly Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	ff00dc1505	PMU: Fix setting of SIAR and SDAR on trace interrupt This arranges for SIAR and SDAR to be set when a trace interrupt is triggered by a non-zero setting of the MSR[TE] field. According to the ISA, SIAR should be set to the address of the instruction and SDAR should be set to the effective address of its storage operand if any. This also fixes setting of SDAR by the PMU when an alert occurs; previously it was always just set to zero. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	23b183fb16	tests/reservation: Check that SRR0 is set correctly on alignment interrupt The tests that intentionally generate alignment interrupts now also check that SRR0 is pointing to a larx or stcx instruction. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	622f8c81cc	loadstore1: Fix setting of SRR0 on alignment interrupt When an alignment interrupt was being generated, loadstore1 was setting the l_out.valid signal in one cycle and l_out.interrupt in the next, for the same instruction. This meant that the offending instruction completed and the interrupt was applied to the next instruction, meaning that SRR0 ended up pointing to the following instruction. To fix this, when an access causing an alignment interrupt is going into r2, we set r2.busy for one cycle and set r2.one_cycle to 0 so that the complete signal doesn't get asserted. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	f64ab6569d	tests/trace: Add a couple of tests of CIABR function Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	5a28f76b6f	execute1: Implement CIABR CIABR (Completed Instruction Address Breakpoint Register) is an SPR that contains an instruction address. When the instruction at that address completes, the CPU takes a Trace interrupt before executing the next instruction (provided the instruction doesn't cause some other interrupt and isn't an rfid, hrfid or rfscv instruction). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	f4ec0c2043	Merge pull request #434 from paulusmack/compliance Improve architecture compliance	6 months ago
Paul Mackerras	7437f699ca	core: Implement the PIR SPR This reports the CPU core number, currently always 0, but this will be useful in future for distinguishing which CPU is which in a multiprocessor system. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	d531e8aa10	dcache: Improve timing Previously we only put slow requests in r1.req, but that caused timing problems because it meant the clock enable for all the registers in r1.req depended on whether we have a TLB and cache hit or not. Now we put any valid request (i.e. with req_go = 1) into r1.req, which has better timing because req_go is a relatively simple function of registered values (r0_full, r0_valid, r0.tlbie, r0.tlbld, r1.full, r1.ls_error, d_in.hold). We still have to work out if we have a slow request, but that is only needed for the D input of one register (r1.full). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago
Paul Mackerras	5121e0f392	core: Implement sync instructions This implements all the sync variants (sync, lwsync, ptesync, etc.) as a LSU op that gets sent down to the dcache and completes once the dcache state machine is idle. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 months ago

1 2 3 4 5 ...

1410 Commits (c938246cc8f3c62e0b4cef358b5cb17a4d83a84e) All Branches Search

1410 Commits (c938246cc8f3c62e0b4cef358b5cb17a4d83a84e)

All Branches