14651e98 | 07-Jan-2025 |
Anzo <[email protected]> |
fix(StoreQueue): remove the incorrect redirect logic (#4139) |
da51a7ac | 07-Jan-2025 |
Anzo <[email protected]> |
fix(VLSU): fix vector exception writeback to 'MergeBuffer' logic (#4137)
Fixed the bug of abnormal signal loss when writing back.
Previously, we expected to compare only the ports of the writebacks
fix(VLSU): fix vector exception writeback to 'MergeBuffer' logic (#4137)
Fixed the bug of abnormal signal loss when writing back.
Previously, we expected to compare only the ports of the writebacks that triggered the exception and pick the oldest.
But amazingly, I just realised that the implementation doesn't match the annotation. The current implementation can be problematic in that if the write-back port that did not have an exception is older, the port that triggered the exception is not elected.
Use s3_exception to try to optimise timing.
show more ...
|
a035c20d | 02-Jan-2025 |
Yanqin Li <[email protected]> |
fix(LQUncache): fix a potential deadblock when enqueue (#4096)
**Old design**: When enqueuing, it is in the order of ldu0-1, i.e. ldu0 is allocated first.
**Bug scene:** LQUncacheBuffer is small. T
fix(LQUncache): fix a potential deadblock when enqueue (#4096)
**Old design**: When enqueuing, it is in the order of ldu0-1, i.e. ldu0 is allocated first.
**Bug scene:** LQUncacheBuffer is small. The enqueue `robIdx` of ldu0-1 is [57, 56, 55], the [57, 56] can enqueue, and [55] can not because buffer is full. 57/56 send the `NC` request after enqueuing. 55 is rollbacked. In principle, 57 and 56 need be flushed. But to ensure the correspondence between requests and responses of uncache, 57 is flushed when getting the uncache response. So when the same sequence [57, 56, 55] is coming, there is still no space to allocate 55, which causes that it is rollbacked again. Then a deadblock emerged. This bug is triggered after cutting `LoadUncacheBufferSize` from 20 to 4.
**One way to fix**: When enqueuing, it is in the order of `robIdx`, i.e. the oldest is allocated first.
show more ...
|
30bd4482 | 30-Dec-2024 |
Anzo <[email protected]> |
fix(LSQ): fix 'enqCancelNum' bit width (#4109) |
c2acf9ea | 30-Dec-2024 |
Anzo <[email protected]> |
fix(StoreQueue): fix `vecLastFlow` set logic (#4105) |
0a84afd5 | 26-Dec-2024 |
cz4e <[email protected]> |
area(VirtualLoadQueue): remove useless regs (#4061)
* remove datavalid, addrvalid, veccommitted * add committed |
be8e95bc | 25-Dec-2024 |
Anzo <[email protected]> |
fix(MemBlock): fix overflow during lsqptr calculation (#4084)
The addition used previously to calculate the `lsq` pointer results in overflow, this is because, the bit width of `numLsElem` is 5 and
fix(MemBlock): fix overflow during lsqptr calculation (#4084)
The addition used previously to calculate the `lsq` pointer results in overflow, this is because, the bit width of `numLsElem` is 5 and multiple uop accumulations result in data overflow.
---
Theoretically this would have been a problem in previous versions as well, but for some reason the bug didn't occur in previous versions until `newDispatch`.
show more ...
|
519244c7 | 25-Dec-2024 |
Yanqin Li <[email protected]> |
submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)
* L1: deliver the NC and PMA signals of uncacheReq to L2 * L2: [support Svpbmt on CHI MemAttr](https://github.com/OpenXiangShan/Coupl
submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)
* L1: deliver the NC and PMA signals of uncacheReq to L2 * L2: [support Svpbmt on CHI MemAttr](https://github.com/OpenXiangShan/CoupledL2/pull/273) * LLC: [Non-cache requests are forwarded directly downstream without entering the slice](https://github.com/OpenXiangShan/OpenLLC/pull/28)
show more ...
|
9b12a106 | 25-Dec-2024 |
Anzo <[email protected]> |
area(LoadQueue): remove useless regs (#4062)
Vector Load's additional release logic in the `RAR/RAW Queue` looks unneeded, which would result in the `RAR/RAW Queue` storing redundant `regs` for `uop
area(LoadQueue): remove useless regs (#4062)
Vector Load's additional release logic in the `RAR/RAW Queue` looks unneeded, which would result in the `RAR/RAW Queue` storing redundant `regs` for `uopidx`.
show more ...
|
54b55f34 | 24-Dec-2024 |
Yanqin Li <[email protected]> |
fix(LQUncache): consider offset when allocating (#4080)
bug scene:
When the valid vector of ldu0-2 is [0, 0, 1], and the freelist can only allocate one entry (when the `canAllocate` vector is [1, 0
fix(LQUncache): consider offset when allocating (#4080)
bug scene:
When the valid vector of ldu0-2 is [0, 0, 1], and the freelist can only allocate one entry (when the `canAllocate` vector is [1, 0, 0]), the ldu2's request can not be allocated and then be rollbacked. This is because the allocation did not take into account the valid offset.
show more ...
|
acc50f3b | 23-Dec-2024 |
Anzo <[email protected]> |
fix(StoreMisalignBuffer): crosspage can only be replaced when `s_idle` (#4077)
Entries in `storeMisalignBuffer` can only be replaced when `s_idle`, and should not be replaced by a new `req` if the s
fix(StoreMisalignBuffer): crosspage can only be replaced when `s_idle` (#4077)
Entries in `storeMisalignBuffer` can only be replaced when `s_idle`, and should not be replaced by a new `req` if the state has been switched and a store is in progress.
show more ...
|
8b33cd30 | 13-Dec-2024 |
klin02 <[email protected]> |
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside Wh
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to XSDebug(cond1 && cond2, pable)
show more ...
|
5de026b7 | 17-Dec-2024 |
Anzooooo <[email protected]> |
fix(LSQ): modify the enq logic
This commit modifies the previous silly queue entry. This greatly reduces the generated verilog, making: StoreQueue verilog in StoreQueue from 26W lines -> 5W lines ve
fix(LSQ): modify the enq logic
This commit modifies the previous silly queue entry. This greatly reduces the generated verilog, making: StoreQueue verilog in StoreQueue from 26W lines -> 5W lines verilog in VirtualLoadQueue from 13W lines -> 2W lines
Also, we can no longer limit the number of numLsElem per `io.enq`.
show more ...
|
0a7d1d5c | 22-Nov-2024 |
xiaofeibao <[email protected]> |
feat(backend): NewDispatch |
562eaa0c | 15-Dec-2024 |
Anzooooo <[email protected]> |
fix(MemBlock): fix misaligned exception and remove redundant reg from `SQ` |
909ea138 | 16-Dec-2024 |
Anzo <[email protected]> |
fix(LSQ): modify misaligned `forward fault` detection (#4038)
Previously, I used an inappropriate way for another misalign to trigger a `forward fault`:
https://github.com/OpenXiangShan/XiangShan/
fix(LSQ): modify misaligned `forward fault` detection (#4038)
Previously, I used an inappropriate way for another misalign to trigger a `forward fault`:
https://github.com/OpenXiangShan/XiangShan/blob/38d0d7c5a34a23dfdb58a3cb2737c3cfddb3ec9d/src/main/scala/xiangshan/mem/lsqueue/StoreQueue.scala#L684-L711
This would cause the `BlockSqIdx` passed to `LoadQueueReplay` to use the `sqIdx` from `uop` instead of the `sqIdx` with the unalign flag bit:
https://github.com/OpenXiangShan/XiangShan/blob/38d0d7c5a34a23dfdb58a3cb2737c3cfddb3ec9d/src/main/scala/xiangshan/mem/lsqueue/StoreQueue.scala#L776-L782
**This leads to a possible stuck in `LoadQueueReplay`.**
And to resolve the stuck, we incorrectly introduced this Commit(af757d1b973e03dae3ce0078a4a8249b593188ec).
This Commit(af757d1b973e03dae3ce0078a4a8249b593188ec) causes `BlockSqIdx` to unblock without `DataValid`. This leads to certain performance issues.
This revision fixes the inappropriate `forward fault` triggering method and reverses the Commit(af757d1b973e03dae3ce0078a4a8249b593188ec).
**This should bring performance back up again.** ### Apologies for my mistake.
show more ...
|
4fb7cc17 | 16-Dec-2024 |
cz4e <[email protected]> |
timing(StoreQueue): cmoReq.address add 1 latch (#3988) |
99baa882 | 13-Dec-2024 |
Anzo <[email protected]> |
fix(StoreQueue): fix the `vecExceptionFlag` setting condition (#4037)
Only if `dataBuffer.io.enq.fire` is considered to have `deq` |
433cc30b | 12-Dec-2024 |
Anzo <[email protected]> |
fix(StoreQueue): fix `difftestinfo` for store event (#4027)
The acquisition of information related to the difftest when a
non-aligned Store is split into a Sbuffer was not considered before. Use
a
fix(StoreQueue): fix `difftestinfo` for store event (#4027)
The acquisition of information related to the difftest when a
non-aligned Store is split into a Sbuffer was not considered before. Use
a more robust way to get the information needed for the difftest.
show more ...
|
2159ac24 | 09-Dec-2024 |
Anzooooo <[email protected]> |
fix(selectOldest): use `===` instead of `isNotBefore`
For instructions with vectors or other multiple `uop`, it is necessary to determine whether `robIdx` is the same before comparing `uopIdx`. Alth
fix(selectOldest): use `===` instead of `isNotBefore`
For instructions with vectors or other multiple `uop`, it is necessary to determine whether `robIdx` is the same before comparing `uopIdx`. Although there is no error if `isNotBefore` is used, we can use the clearer and more concise `===` to make the determination.
show more ...
|
1b5499a2 | 03-Dec-2024 |
Anzooooo <[email protected]> |
fix(LSU): `rfwen` not be set when `WakeUp` cancelled or not need `WakeUp` |
af757d1b | 01-Dec-2024 |
Anzooooo <[email protected]> |
fix(LoadQueueReplay): more precise for unblocking `forwarding fault`
It is not necessary to check whether the storequeue entry pointed to by sqidx is complete, because this entry is the store that f
fix(LoadQueueReplay): more precise for unblocking `forwarding fault`
It is not necessary to check whether the storequeue entry pointed to by sqidx is complete, because this entry is the store that follows this load.
show more ...
|
b240e1c0 | 07-Nov-2024 |
Anzooooo <[email protected]> |
feat(Zicclsm): refactoring misalign and support vector misalign |
549073a0 | 10-Dec-2024 |
cz4e <[email protected]> |
area(Lsq): compress rar/raw paddr and remove sq useless regs (#3976)
* LoadQueueRAR PAddr hash function, total 16bits:
: compress rar/raw paddr and remove sq useless regs (#3976)
* LoadQueueRAR PAddr hash function, total 16bits:

* LoadQueueRAW use PAddr[29:6], total 24bits
show more ...
|
e10e20c6 | 27-Nov-2024 |
Yanqin Li <[email protected]> |
style(pbmt): remove the useless and standardize code
* style(pbmt): remove outstanding constant which is just for self-test
* fix(uncache): added mask comparison for `addrMatch`
* style(mem): code
style(pbmt): remove the useless and standardize code
* style(pbmt): remove outstanding constant which is just for self-test
* fix(uncache): added mask comparison for `addrMatch`
* style(mem): code normalization
* fix(pbmt): handle cases where the load unit is byte, word, etc
* style(uncache): fix an import
* fix(uncahce): address match should use non-offset address when forwading
In this case, to ensure correct forwarding, stores with the same address but overlapping masks cannot be entered at the same time.
* style(RAR): remove redundant design of `nc` reg
show more ...
|