ee92d6ff | 25-Apr-2025 |
Yanqin Li <[email protected]> |
fix(StoreQueue): add nc_req_ack state to avoid duplicated request (#4625)
## Bug Discovery The Svpbmt CI of master at https://github.com/OpenXiangShan/XiangShan/actions/runs/14639358525/job/41077890
fix(StoreQueue): add nc_req_ack state to avoid duplicated request (#4625)
## Bug Discovery The Svpbmt CI of master at https://github.com/OpenXiangShan/XiangShan/actions/runs/14639358525/job/41077890352 reported the following implicit output error:
``` check_misa_h PASSED test_pbmt_perf TEST: read 4 Bytes 1000 times
Svpbmt IO test... addr:0x10006d000 start: 8589, end: 59845, ticks: 51256
Svpbmt NC test... addr:0x10006c000 start: 67656, end: 106762, ticks: 39106
Svpbmt NC OUTSTANDING test... smblockctl = 0x3f7 addr:0x10006c000 start: 118198, end: 134513, ticks: 16315
Svpbmt PMA test... addr:0x100000000 start: 142696, end: 144084, ticks: 1388 PASSED test_pbmt_ldld_violate ERROR: untested exception! cause NO: 5 (mhandler, 219) [FORK_INFO pid(1251274)] clear processes... Core 0: HIT GOOD TRAP at pc = 0x80005d64 Core-0 instrCnt = 174,141, cycleCnt = 240,713, IPC = 0.723438 ```
## Design Background For NC (Non-Cacheable) store operations, the handshake logic between the StoreQueue and Uncache is as follows:
1. **Without Outstanding Enabled:** In the `nc_idle` state, when an executable `nc store` is encountered, it transitions to the `nc_req` state. After `req.fire`, it moves to the `nc_resp` state. Once `resp.fire` is triggered, it returns to `nc_idle`, and both `rdataPtrExtNext` and `deqPtrExtNext` are updated to handle the next request.
2. **With Outstanding Enabled:** In the `nc_idle` state, upon encountering an executable `nc store`, it transitions to the `nc_req` state. After `req.fire`, it **returns to `nc_idle`** (Point A). Once the request is fully written into Uncache, i.e., upon receiving `ncSlaveAck` (Point B), it updates `rdataPtrExtNext` and `deqPtrExtNext` to handle the next request.
## Bug Description In the above scenario, since the transition to `nc_idle` at Point A occurs earlier (by two cycles) than Point B due to timing differences, the `rdataPtr` at Point A still points to the location of the previous uncache request (let’s call it NC1). The condition for sending uncache request is still met at this moment, leading Point A to issue a **duplicate `uncache` request** for NC1.
By the time Point B occurs, **two identical requests for NC1** have already been sent. At Point B, `rdataPtr` is updated to proceed to the next request. However, when the **second `ncSlaveAck`** for NC1 returns, `rdataPtr` is updated **again**, causing it to move forward **twice** for a single request. This eventually results in one of the following requests never being executed.
## Bug Fix Given that multiple cycles are required to ensure that a request is fully written to Uncache, a new state called `nc_req_ack` is introduced. The revised handshake logic with outstanding enabled is as follows:
In the `nc_idle` state, when an executable `ncstore` is encountered, it transitions to the `nc_req` state. After `req.fire`, it moves to the `nc_req_ack` state. Once the request is fully written to Uncache and `ncSlaveAck` is received, it transitions back to `nc_idle`, and updates `rdataPtrExtNext` and `deqPtrExtNext` to handle the next request.
show more ...
|
6683fc49 | 25-Apr-2025 |
Zhaoyang You <[email protected]> |
fix(csr): filter out Read-Only CSR in regOut (#4412) |
00c6a8aa | 25-Apr-2025 |
Guanghui Cheng <[email protected]> |
fix(criticalError): Stop counting `wfi_cycles` when disable `wfiResume` (#4623)
* The precondition for `commitStuck_overflow` to trigger a critical error is that `WFI` resumes after 1M(2^20) cycles. |
53bd4e1c | 24-Apr-2025 |
Tang Haojin <[email protected]> |
build: add configuration for `CHIAddrWidth` and `enableL2Flush` (#4621) |
1191982f | 24-Apr-2025 |
Zhaoyang You <[email protected]> |
fix(intr,difftest): add interrupt delegate (#4516) |
d011b690 | 24-Apr-2025 |
Ma-YX <[email protected]> |
submodule(CoupledL2): bump CoupledL2 (#4616)
Bump CoupledL2, this pr includes: 1. set data SRAM's dataSplit = 8 * Set data SRAM(`dataArray` in `DataStorage`) dataSplit = 8. Previously the dataSpl
submodule(CoupledL2): bump CoupledL2 (#4616)
Bump CoupledL2, this pr includes: 1. set data SRAM's dataSplit = 8 * Set data SRAM(`dataArray` in `DataStorage`) dataSplit = 8. Previously the dataSplit = 4 and encDataBankBits = 137, due to area demand, the `dataArray` SRAM bankBits should be 69. Therefore, after ECC encode, the data need further split = 2, and add 0 padding(4 bits) each cache line. * Avoid tag split when tag SRAM's `dataSplit` requirement cannot be met. This occurs when L2 size changes or `dataSplit` changes or address width. * Parameterize Split of tag and data. 2. remove unused register of WriteEvictOrEvict logics 3. remove deprecated cache step 4. support parameterized addr width by cde
show more ...
|
862747db | 23-Apr-2025 |
zhaohong1988 <[email protected]> |
fix: sync the signals before use for lowpower (#4610) |
98339775 | 22-Apr-2025 |
sinceforYy <[email protected]> |
submodule(ready-to-run): Bump nemu ref in ready-to-run
* NEMU commit:27ca6cb5f7d75014ca795908194bfb39711f9dc2 * NEMU configs: * riscv64-xs-ref_defconfig * riscv64-dual-xs-ref_defconfig * riscv
submodule(ready-to-run): Bump nemu ref in ready-to-run
* NEMU commit:27ca6cb5f7d75014ca795908194bfb39711f9dc2 * NEMU configs: * riscv64-xs-ref_defconfig * riscv64-dual-xs-ref_defconfig * riscv64-xs-ref-debug_defconfig * riscv64-dual-xs-ref-debug_defconfig * riscv64-xs-ref_bitmap_defconfig
Including: * fix(xtopi): fix m/stopi.IRPIO generation conditions * fix(vstopi): fix vstopi result selection
show more ...
|
6e51c65d | 16-Apr-2025 |
sinceforYy <[email protected]> |
fix(vstopi): fix vstopi result selection
* AIA Spec: * Ties in nominal priority are broken as usual by the default priority * order from Table 8, unless hvictl fields VTI = 1 and IID ≠ 9 * (last ite
fix(vstopi): fix vstopi result selection
* AIA Spec: * Ties in nominal priority are broken as usual by the default priority * order from Table 8, unless hvictl fields VTI = 1 and IID ≠ 9 * (last item in the candidate list above), in which case * default priority order is determined solely by hvictl.DPR.
* If bit IPRIOM (IPRIO Mode) of hvictl is zero, IPRIO in vstopi is 1; * else, if the priority number for the highest-priority candidate * is within the range 1 to 255, IPRIO is that value; else, IPRIO * is set to either 0 or 255 in the manner documented for stopi * in Section 5.4.2.
show more ...
|
ece71978 | 15-Apr-2025 |
sinceforYy <[email protected]> |
fix(xtopi): fix m/stopi.IRPIO generation conditions
* If all bytes of the supervisor-level iprio array are read-only zeros, * a simplified implementation of field IPRIO is allowed in which * its val
fix(xtopi): fix m/stopi.IRPIO generation conditions
* If all bytes of the supervisor-level iprio array are read-only zeros, * a simplified implementation of field IPRIO is allowed in which * its value is always 1 whenever stopi is not zero. * * We are configurable and do not need to simplify the implementation.
show more ...
|
9e0994ab | 22-Apr-2025 |
cz4e <[email protected]> |
fix(AXI4Memory): fix write request enqueue DRAMSim logic for AXI4Memory (#4611) |
688cc4e8 | 22-Apr-2025 |
Anzo <[email protected]> |
fix(VLSU): modifying vector misalign elemidx generation (#4593)
For "unit-stride access with element granularity misaligned and emul<0", it could be the case that: has only once valid elements, but
fix(VLSU): modifying vector misalign elemidx generation (#4593)
For "unit-stride access with element granularity misaligned and emul<0", it could be the case that: has only once valid elements, but splits into two flows(misaligned), which would result in the `elemidx` being the same, making it impossible for the exception handling logic in the `mergebuffer` to recognise the correct order.
Instead of adding a new variable, we have chosen to reuse `elemidx` as a marker. But this does pollute the original semantics of `elemidx`.
show more ...
|
1e7e38e2 | 22-Apr-2025 |
Anzo <[email protected]> |
chore(Parameters): remove the incorrect parameter description (#4391)
This is a misrepresentation; currently, there is only one item in the fofbuffer. |
f9ed852f | 22-Apr-2025 |
NewPaulWalker <[email protected]> |
fix(xiselect): set the minimum range for xiselect (#4594)
The miselect register implements at least enough bits to support all implemented miselect values. The siselect register will support the val
fix(xiselect): set the minimum range for xiselect (#4594)
The miselect register implements at least enough bits to support all implemented miselect values. The siselect register will support the value range 0..0xFFF at a minimum. The vsiselect register will support the value range 0..0xFFF at a minimum.
show more ...
|
99a48a76 | 21-Apr-2025 |
cz4e <[email protected]> |
timing(LoadQueueUncache): adjust s1 enq and s2 enq valid generate logic (#4603) |
cd450e32 | 21-Apr-2025 |
Anzo <[email protected]> |
submodule(ready-to-run): bump nemu and spike ref in ready-to-run (#4604) |
51ad03b0 | 21-Apr-2025 |
Zhaoyang You <[email protected]> |
fix(rename): fix Csrr format (#4605) |
d7dd2491 | 21-Apr-2025 |
zhaohong1988 <[email protected]> |
submodule(ChiselAIA): bump ChiselAIA (#4595) |
be3685ff | 21-Apr-2025 |
xu_zh <[email protected]> |
chore(scalastyle): disable space-around-operator checks (#4567) |
b5c820f6 | 21-Apr-2025 |
Guanghui Cheng <[email protected]> |
fix(top): enable cpuclock when debug halt req (#4583) |
3aa632ec | 21-Apr-2025 |
Anzo <[email protected]> |
fix(StoreUnit): cbo violation check should check cacheline (#4592)
The cbo instruction should check for violations at the granularity of cacheline.
Theoretically modifying the condition of this var
fix(StoreUnit): cbo violation check should check cacheline (#4592)
The cbo instruction should check for violations at the granularity of cacheline.
Theoretically modifying the condition of this variable would allow checking at cacheline granularity in RAW and should not introduce any other side effects.
show more ...
|
ce78e60c | 21-Apr-2025 |
Anzo <[email protected]> |
fix(StoreQueue): remove `cboZeroUop` saved `sqptr` (#4591) |
dccbba58 | 21-Apr-2025 |
cz4e <[email protected]> |
fix(MainPipe): fix error report valid when Atomics and SBuffer request miss (#4572)
* Sbuffer write and Atomics should not report errors, but refill from L2 should report ecc error, but requests in
fix(MainPipe): fix error report valid when Atomics and SBuffer request miss (#4572)
* Sbuffer write and Atomics should not report errors, but refill from L2 should report ecc error, but requests in MissQueue carry `isAmo` or `isStore` and `req.miss` in a request, hence `(s2_req.isAMO || s2_req.isStore)` includes the refill, so the missing request of `isAmo` or `isStore` will not report an error
show more ...
|
d69a8286 | 21-Apr-2025 |
cz4e <[email protected]> |
fix(MainPipe): fix `s1_way_en` logic when pseudo tag error inject (#4573)
* `s1_way_en` should use `io.pseudo_error.valid` on the same stage, not `RegNext(io.pseudo_error.valid)`. Otherwise, `MainPi
fix(MainPipe): fix `s1_way_en` logic when pseudo tag error inject (#4573)
* `s1_way_en` should use `io.pseudo_error.valid` on the same stage, not `RegNext(io.pseudo_error.valid)`. Otherwise, `MainPipe` may use wrong way enable to write DCache, it will result in two identical tags.
show more ...
|
57a8ca5e | 20-Apr-2025 |
Haoyuan Feng <[email protected]> |
fix(LLPTW): dup_wait_resp should not send last_hptw_req when excp (#4596)
In the original design, the condition for `to_last_hptw_req` was: `dup_wait_resp && entries(io.mem.resp.bits.id).req_info.s2
fix(LLPTW): dup_wait_resp should not send last_hptw_req when excp (#4596)
In the original design, the condition for `to_last_hptw_req` was: `dup_wait_resp && entries(io.mem.resp.bits.id).req_info.s2xlate === allStage`. As a result, when a newly entered LLPTW request dups with an entry currently returning from memory, and the new request is marked as allStage, the `to_last_hptw_req` signal would be true. This causes the state machine to transition to the `state_last_hptw_req` state and send a request to HPTW.
However, if the page table returned from memory contains a `vsStagePf` or `gStagePf`, it should directly go to `mem_out` or `bitmap_check` without performing a final HPTW translation. Therefore, this commit fixes the bug by adding a restriction to the original `to_last_hptw_req` condition to ensure that no exceptions are present; otherwise, the state machine will transition to either `mem_out` or `bitmap_check`.
Additionally, this PR also fixes a bug where `last_hptw_req_ppn` did not account for the napot case.
show more ...
|